A SECOND COURSE 

IN 

CALCULUS 




This book is in the 

ADDISON-WESLEY SERIES IN MATHEMATICS 



A SECOND COURSE 

IN 

CALCULUS 


SERGE LANG 

Columbia University, New York, New York 





ADDISON-WESLEY PUBLISHING COMPANY, INC. 


READING, MASSACHUSETTS • PALO ALTO • LONDON 










,5 §® £ 


mr 




tiiisca 

& 




^©0 


uoDQii?i«n) 


Mining S2M51[p®nJl§?o ILr'M 





@E& aa hs 



\fi 


IPIUIBUMOI^IG €©E;QIPAM¥o M€: 


AM, 5S!n@!HI2S JsaSSE;®‘^fSD» rass SOTfiSj S’!; IP&sMS 
Sii^a'fi' :2i::s ms’cmas’^sea© iiS3 

wffiawss? jFsrsseBssn©?? ©:? jp'zisiiniissKiis:^. 




€mrd m. (§&=sm'n 











Foreword 


This volume is a continuation of A First Course in Calculus, and deals 
principally with functions of several variables. 

The first chapter deals with vectors. The rest of the book separates 
naturally into two parts. The first part deals with the calculus of func¬ 
tions of several variables, and the second part deals with linear algebra. 
These are essentially independent, and the short Chapter XIII can be 
omitted entirely, without prejudice to the understanding of the rest of the 
book. However, it affords a beautiful crossing point of the algebra and 
analysis. 

Thus, after covering Chapter I, there are two possibilities for the order 
in which one can read the rest of the book. One may cover immediately 
Chapters II through VIII, which essentially do for functions of several 
variables what the First Course did for functions of one variable. Or, one 
can cover Chapters IX, X, XI, XII. XIV, XV to get an introductory course 
in linear algebra. Each part will require approximately one semester. 
Thus the whole book may be used for a year’s work. The order in which 
one covers the two parts will depend on the taste of the instructor, the 
mood of the students, and the requirements of the course. 

The chapters on linear algebra arc not meant to give a complete treat¬ 
ment of the subject. They are meant as an introduction to the notions of 
vector space, bases, linear mappings, matrices, and determinants. These 
notions have become so basic in mathematics (both pure and applied), that 
it is desirable to get acquainted with them as early as possible. Here 
again, the mood of the instructor and the degree of sophistication of the 
students will determine to what extent the emphasis in these chapters is 
computational or theoretical. At any rate, introducing students early to 
the notions of linear algebra makes it possible a year later to give better 
courses at a more advanced level on the calculus of functions of several 
variables (including the inverse mapping theorem, differential equations, 
multiple integration, etc.), and linear algebra (including quadratic forms, 
the dual space, etc.). 

At the present level, multiple integration presents a real expository 
problem. It is absolutely impossible to develop any sort of theory co¬ 
herently without some linear algebra, determinants, and techniques of 
uniformity. Thus I feel it is better to postpone this theory to the third 
year of calculus, which should essentially be a first course in analysis. On 
the other hand, various science courses (e.g. physics) require a minimum of 



VI 


FOREWORD 


technique in evaluating double and triple integrals. Thus the chapter on 
multiple integration simply states rigorously (without proof) certain 
computational recipes describing this technique. In addition to that, one 
must recognize that part of the purpose of such a chapter is geometric: 
To give the student practice in visualizing some three-dimensional figures 
and their boundaries. Strictly speaking, this is not entirely mathematical, 
but is still regarded as a requirement of the course. Pushed to extremes 
in a course of mathematics, it becomes extremely oppressive. I have tried 
to include just the right amount to carry out the responsibility that the 
course owes to other science courses, and yet preserve the overall coherence 
which I have sought to attain. 

The chapter on complex numbers can be read immediately after Chap¬ 
ter I, and one could discuss vectors with complex coordinates. The last 
three chapters (applications to functions of several variables, determinants, 
and complex numbers) are logically independent, and the order in which 
they are treated permits some variations of emphasis for the course. 

The size of the book is deceptive. Most instructors will find that there is 
more material than can be covered in one year, and some topics, obviously 
less important than others, can be omitted (for instance the discussion of 
parametrization by arc length, or Lagrange multipliers). The chapter on 
determinants has been written in such a way that a student can learn what 
determinants are, and can learn to compute them, without reading the 
proofs. It) otlier words, determinants are characterized by their properties. 
To go through the proofs requires a fairly high level of abstraction or com¬ 
putation which is unavoidable. Thus the proofs may be omitted in the 
general case, although it is recommended that students go through them 
explicitly in the cases of dimension 2 and 3. 

.‘Vt some point during the first or second course, the instructor should 
give a discussion of proofs by induction. I find it inadvisable to give such a 
discussion at the very beginning. It is better to carry out such proofs in a 
natural context for a while, and then point out formally exactly what is 
involved. Hence I have stated the pattern of induction in an appendix, 
and anyone teaching the course can decide for himself precisely when he 
wants to discuss it. In teaching induction, I believe that (as with foreign 
languages) it is better to learn how to use the language first, and then 
formalize it, i.e. give its rules of grammar and syntax. Needless to say, 
the brighter the students are, the earlier they should be exposed to the 
formalization. 

There is no reason why some of the contents of this book should not 
soon be taught in the secondary schools. This applies especially to Chap¬ 
ter I, to the calculus of matrices, determinants (suitably axiomatized), 
and complex numbers. The chapter on complex numbers could also be 
covered in the First Course. 



FOREWORD 


Vll 


It seemed advisable to insert an appendix on the sine and cosine func¬ 
tions, and angles, to show how their theory can be derived purely analyt¬ 
ically. The section on sine and cosine could be read with the First Course, 
but to discuss angles, it is necessary to introduce some notions of linear 

algebra. 

I have made great efforts to make the style of presentation as naive 
and unpretentious as possible, deliberately avoiding introducing more 
vocabulary than is necessary to understand the concepts with which we 
deal. In linear algebra, certain abstractions are both unavoidable and 
desirable. I hope that the reader will find neither too much nor too little. 

Serge Lang 
New York, 1964 




Contents 


Chapter I 
Vectors 

1. Definition of points in n-space. 1 

2. Vectors. 4 

3. Scalar product. 6 

4. The norm of a vector. 8 

5. Lines and planes.13 

6. The cross product.17 

Chapter II 

Dififerentiation of Vectors 

1. Derivative.19 

2. The chain rule and applications.24 

Chapter III 

Functions of Several Variables 

1. Graphs and level curves.28 

2. Partial derivatives.31 

3. Diflferentiability and gradient.34 

Chapter IV 

The Chain Rule and the Gradient 

1. The chain rule 40 

2. Tangent plane.43 

3. Directional derivative.45 

4. Conservation law . 47 

Chapter V 

Potential Functions and Line Integrals 

1. Potential functions.50 

2. Line integrals.56 

ix 




















X 


CONTENTS 


Chapter VI 
Taylor's Formula 

1. Repeated partial derivatives. 

2. Partial differential operators. 

3. Taylor’s formula. 

4. Estimate for the remainder. 

Chapter VII 

Maximum and Minimum 

1. Critical points. 

2. The quadratic form. 

3. Boundary points. 

4. Lagrange multipliers. 

Chapter VIII 
Multiple Integrals 

1. Double integrals. 

2. Polar coordinates. 

3. Triple integrals. 

Chapter IX 
Vector Spaces 

1. Definitions. 

2. Bases. 


60 

64 

70 

75 


79 

81 

82 

84 


88 

95 

98 


105 

108 


Chapter X 

Linear Equations and Bases 

1. Matrices. 

2. Homogeneous linear equations. 

3. Invariance of dimension. 

4. Orthonormal bases. 

5. A geometric interpretation. 


113 

116 

120 

122 

127 


Chapter XI 
Linear Mappings 


1. Mappings.129 

2. Linear mappings.132 

3. The kernel of a linear map.138 
























CONTENTS 


XI 


4. Kernel and image.140 

5. The rank of a matrix.141 

6. Orthogonal maps.144 

Chapter XII 

Linear Maps and Matrices 

1. The linear map associated with a matrix.146 

2. The matrix associated with a linear map.150 

3. Composition of linear mappings.154 

4. Multiplication of matrices.157 

5. Applications to linear equations.162 

Chapter XIII 

Applications to Functions of Several Variables 

1. The derivative as a linear map .164 

2. The Jacobian matrix 167 

3. The chain rule.1^1 

Chapter XIV 
Determinants 

1. Determinants of order 2.175 

2. Properties of determinants.176 

3. Cramer’s rule. 

4. Inverse of a matrix 184 

5. Proofs of some properties 186 

6. Uniqueness 1®7 

7. Determinant of a transpose.189 

8. Existence.1^1 

9. Determinant of a product 192 

Chapter XV 
Complex Numbers 

1. Definition.184 

2. Polar form. ... 187 

Appendix 1. Induction.201 





























CONTENTS 


• » 

Xll 

Appendix 2. € and 6 again.205 

Appendix 3. Sine, Cosine, and Angle.209 

1. The functions sin and cos.209 

2. Angles. 

Answers to Exercises.223 

Index 241 








CHAPTER I 

Vectors 


The concept of a vector is basic for the whole course. It provides 
geometric motivation for everything that follows. Hence the properties of 
vectors, both algebraic and geometric, will be discussed in full. 

The cross product is included for the sake of completeness. It is never 
used in the rest of the book. It is the only aspect of the theory of vectors 
which is valid only in three-dimensional space (not 2, nor 4, nor n-dimen- 
sional space). One significant feature of almost all the statements and 
proofs of this book (except for those concerning the cross product), is that 
they are neither easier nor harder to prove in 3 or n-space than they are 

in 2-space. 

§1. Definition of points in n-space 

We have seen that a number can be used to represent a point on a line, 
once a unit length is selected. 

A pair of numbers (i.e. a couple of numbers) (x, y) can be used to 
represent a point in the plane. 

We now observe that a triple of numbers (x, y, z) can be used to repre¬ 
sent a point in space, that is 3 -dimensional space, or 3-space. We simply 
introduce one more axis. The next picture illustrates this. 

2-axis 




y^axis 


Figure 1 

Instead of using x, y, z we could also use (xi, X2, X3). The line could be 
called l-space, and the plane could be called 2-space. 

I 






2 


VEC?rORS 


[I. §1] 


Thus we can say that a single number represents a point in 1-space. A 
couple represents a point in 2-space. A triple represents a point in 3-space. 

Although we cannot draw a picture to go further, there is nothing to 
prevent us from considering a quadruple of numbers 

(3=1, 12, ^3, ^a) 

and decreeing that this is a point in 4-space. A quintuple would be a 
point in 5-space, then would come a sextuple, septuple, octuple,.... 

We let ourselves be carried away and define a point in n-apace to be an 
n-tuple of numbers 

(Xl, X2i • • • » ^n), 

if n is a positive integer. We shall denote such an n-tuple by a capital 
letter X, and try to keep small letters for numbers and capital letters for 
points. We call the numbers xi, . .. , Jn the coordinates of the point X. 
We shall now define how to add points. \i A, B are two points, say 

A = (Uj, • • • , ®n), ^ ” (^1, ■ • ■ » ^n), 

then we define A -!- B to be the point whose coordinates are 

(Oi -b 6i.On + ^n)- 

For example, in the plane, if A = (1, 2) and B — (—3, 5) then A -|- B = 
(-2, 7). In 3-space, if A = (-1, ir, 3) and B = (\/2, 7, -2) then 

A + B = (>/2 - l.TT 4- 7, 1). 

Furthermore, if c is any ntimber, we define cA to be the point whose co¬ 
ordinates are 

(caj, .... cOn). 

If A = (2, — 1, 5) and c = 7 then cA — (14, —7, 35). 

We observe that the following rules are satisfied: 

(1) (A -t- B) -bC - A + (B -bC). 

(2) A -b B = B -b A. 

(3) cCA -b B) = cA + cB. 

(4) If Cl, Cj are numbers, then 

(ci + C 2 )A = CiA -b CaA and {CiC 2 )A = Ci(c 2 A). 

(5) If we let 0 = (0, . . . , 0) be the point all of whose coordinates are 0, 
then 0 -b A = A -b 0 = A for all A. 


DEFINITION OF POINTS IN n-SPACE 


3 


[ 1 .511 


(6) 1. A — -A, and if we denote by — ^4 the n-tuple (—1)A, then 

A + (— A) = 0. 

[Instead of writing A + (-B), we shall frequently write A - B.] 

All these properties are very simple to prove, and we suggest that you 

verify them on some examples. 

We shall give in detail the proof of property (3). 

Let A = (tti,..., Ufi) and B = (6i, • • • i W)- Then 

A + B = (fli + fci, • • • I an + ^n) 

and 

c(A + B) = (c(ai + ^* 1)1 • ■ • I "I" ^n)) 

= (cai + c6i, ... ,can cbn) 

= cA + cB, 

this last step being true by definition of addition of n-tuples. 

The other proofs are left as exercises. 

Note. Do not confuse the number 0 and the n-tuple (0,..., 0). We 
usually denote this n-tuple by 0, and also call it zero, because no difficulty 

can occur in practice. 

We shall now interpret addition and multiplication by numbers geo¬ 
metrically in the plane (you can visualize simultaneously what happens 

in 3-space). 

Take an example. Let A = (2,3) and B = (—1,1). Then A + B = 
(1,4). The figure looks like a parallelogram (Fig. 2). 




Take another example. Let A — (3, l)andB — (li2). Then A B — 
(4,3). We see again that the geometric representation of our addition 
looks like a parallelogram (Fig. 3). 



4 


VECTORS 


[I, §21 


What is the representation of multiplication by a number? Let 
A = (1, 2) and c = 3. Then cA = (3. 6) (Fig. 4a). 

Multiplication by 3 amounts to stretching A by 3. Similarly, ^A 
amounts to stretching A by i, i.e. shrinking A to half its size. In general, 
if ( is a number, i > 0, we interpret M as a point in the same direction 
as A from the origin, but t times the distance. 

Multiplication by a negative number reverses the direction. Thus ~ZA 
would be represented as in Fig. 4(b). 



Exercises 

Find .1 A- B, A — B, 3.-I, —2B in each of the following cases. 

\. A = (2.-1), B = (-1, 1) 2 . A = (-1,3), B = (0,4) 

3. .-1 = (2, -1,5), B = (-1,1, 1) 4. A = (-1,-2, 3), B = (-1,3,-4) 

5. A = (t,3, -1), B = (2ir,-3.7) 6. 4 = (15,-2, 4), B = (»,3,-I) 

7. Draw the points of Exercises 1 through 4 on a sheet of graph paper. 

8. IjCt .4, B be as in Exercise 1. Draw the points .4 + 2B, .4 + 3B, A — 2B, 
A — 3B, .4 4* iB on a sheet of graph paper. 

§2. Vectors 

We define a located vector to be a pair of points which we write AB. 
(This is not a product.) We visualize this as an arrow between A and B. 
We call A the 6e^’nnzn^ ■point and B the end point of the located vector 
(Fig. 5). 

How are the coordinates of B obtained from those of ^4? We observe 
that in the plane, 

= Ul + (^>1 ~ Oi). 

Similarly, 

62 = 02 “F (^2 — 02)* 


VECTORS 


5 


[I, §2] 



This means that 


B = A {B - A). 


Let AS and CZ) be two located vectors. We shall s^that they are 
equivalent if B - A = O - C. Every located vector AB is equivalent 
to one whose beginning point is the origin, because AB is equivalent to 
0{B — A). Clearly this is the only located vector whose beginning point 
is the origin and which is equivalent to AB. If you visualize the parallelo¬ 
gram law in the plane, then it is clear that equivalence of two located 
vectors can be interpreted geometrically by saying that the lengths of the 
line segments determined by the pair of points are equal, and that the 

"directions” in which they point are the same. _ _ 

In the next figure, we have drawn the located vectors 0(fi - A)andAZ^. 


Figure 6 



Given a located vector OC whose beginning point is the ori^, we shall 
say that it is located at the origin. Given any located vector AB, we shall 
say that it is located at A. 

A located vector at the origin is entirely determined by its end point. 
In view of this, we shall call an n-tuplc cither a point or a vector, de¬ 
pending on the interpretation which^we have in mind. 

Two located vectors AB and BQ are said to be parallel if there is a 
number c 0 such that B — A = c{Q — P). They are said to have the 
same direction if there is a number c > 0 such that B — A = c{Q ~ P), 
and to have opposite direction if there is a number c < 0 such that 
B ~ A = c{Q — P). In a similar manner, any definition made concern¬ 
ing n-tuples can be carried over to located vectors. For instance, in the 



6 


VECTORS 


[I, §31 


next section, we shall define what it means for n-tuplesjo be perpendicular. 
Then we can say that two located vectors AB and PQ are perpendicular 
if B is perpendicular to Q — P. In the next figure, we have drawn 
a picture of such vectors in the plane. 



§5. Scalar product 

It is understood that throughout a discussion we select vectors always 
in the same n-dimcnsional space. 

Let A = (a,, . . . , a„) and B = {bu be two vectors. We 

define tlieir scalar or dot product .'1 • B to be 

ai6i -r ' • • -f 

Tills product is a Ni<m6fr. For instance if 

,-i = (1,3. —2) and B = (—1,4, —3) 

tlicn 

.1 . B = -1 + 12 + G = 17. 

For the moment, we do not give a geometric interpretation to this scalar 
product. We shall do tills later. We derive first some important prop¬ 
erties. Tlie basic ones are: 

SI* 1. IFc have A ' B = B ■ A. 

SI* 2. If .4, B, C arc three vectors then 

A ■ {B C) = A ■ B -h A -C = (B C) • A. 

SI* 3. If T is a number, then 

(j-.l) • B - x{A ■ B) and A ■ {xB) = x{A • B). 

SI* 4. If A = O is the zero vector, then ./I • .4 = 0, and otherwise 
.1 • A > 0 . 


[I, §3) 


SCALAR PRODUCT 


I 

We shall now prove these properties. 

Concerning the first, we have 

Qibi + • • ■ + aJ>n = ^ 1^1 + • • • + 

because for any two numbers, a, 6, we have ab = ha. This proves the 
firet property. 

For SP 2, let C = (ci, - ■ ■ , Cn)- Then 

B + C = {bl + Cl. ■ • ■ . + Cn) 

and 

.4 • (fi + C) = aiibi + c,) -i-+ an{6„ *f c„) 

= flihi + UiCi -f- • • • + Onbn + UnCn- 


Reordering the terms yields 

ai6, + ■ • • + o„hn + a,Ci + ■ • • + a„c„, 

which is none other than .1 ■ R + A ■ C. This proves wliat we wanted. 
We leave property SP 3 as an exercise. 

Finally, for SP 4, we observe that if otic coordinate a, of A is not equal 
to 0, then there is a term af ^ 0 and af > 0 in the scalar product 

,1 • .1 = fl? f • • • + Un. 


Since every term is ^ 0, it follow.s that the sum is ^ 0, as uas to be 
shown. 

In much of the work which we shall do concerning vectors, we shall use 
only the ordinary properties of addition, multiplication by numbers, and 
the four properties of the scalar product. We shall give a formal discussion 
of these latei'. For tlie moment, observe that there are other objects with 
which you are familiar and which can be added, subtracted, and multiplied 
by numbei-s, for instance the continuous functions on an interval (n, b] 

{cf. Exercise h). 

Instead of writing A • A for the scalar product of a vector with itself, 
it will be convenient to write also .1^. (This is the only instance when wo 
allow ourselves such a notation. Thus A ■* has no meaning.) As an exercise, 
verify the following identities: 


(,1 + B)2 = + 2A ■ li + 

(A-/?)■■'= A^-2A BA-Ii^ 


We define two vectors A, B to be perpendicular (or as we shall also say, 
orthogonal) \i A ■ B — 0. For the moment, it is not clear that in the 
plane, this definition coincides with our intuitive geometric notion of per¬ 
pendicularity. We shall convince you that it does in the next section. 



8 


VECTORS 


(I, §41 


Exercises 

1. Find A • .1 for each one of the n-tuples of Exercises 1 through 6 of §1. 

2. Find .-1 • B for each one of the n-tuples as above. 

3. Using only the four properties of the scalar product, verify in detail the 
rules giving (.-I + B)^ and (.1 — B)^. 

4. Which of the following pairs of vectors are perpendicular? 

(a) (1, -1, 1) and (2. 1. 5) (b) (1, -1, 1) and (2, 3, 1) 

(c) (-5, 2, 7 ) and (3, —1, 2) (d) (tt, 2, 1) and (2, —ir, 0 ) 

5. Consider continuous functions on the interval [—1, 1). Define the scalar 
product of two such functions/, g to be 

j ^ /(x)ff(x) dx. 

We denote this integral also by </. g). Verify that the four rules for a scalar 
product are satisfied, in other words, show that: 

SP 1. (f.g) = (g.f). 

SP 2. a gA-h) = (/, g) + </. k). 

SP3. {cf,g} = c{f,g). 

SP 4. Iff = 0 then (/, f) = 0 and if f 9^ G then (/. /) > 0. 

G. If/(j) = X and ff(i) = x^, what are (/,/), {g. g). and (J, ff)? 

7 . Consider continuous functions on the interval (—tt, ir\. Define a scalar 
product similar to the above for this interval. Show that the functions sin nx 
and cos mz are orthogonal for this scalar product (m, n being integers). 

8. Let .4 be a vector perpendicular to every vector A’. Show that -4 = 0. 

^4. The norm of a vector 

The following inequality is called the Schwarz inequality and is funda¬ 
mental in the theory of vectors. 

Theorem 1. Let A, B be two vectors. Then 

(/I • g U • A){B-B). 

Proof. Let X — B ' B and y — —A B. Then by SP 4 we have 

0 ^ (x.4 + yB) • (xA 4- yB). 

We multiply out the right-hand side of this inequality and get 

0 ^ x‘^{A ■ A) + ‘2xy{A ■ B) -\- y^{B • B). 

Substituting the values for x and y yields 

0 g (5 ■ B)^{A • .1) - 2(B • B){A - B)'^ A- (A • B)^iB ■ B). 


U. §41 


THE NORM OF A VECTOR 


9 


If B = 0 then the inequality of the theorem is obvious, both sides being 
equal to 0 . If B 9 ^ 0, then B • B 0 and we can divide this last expres¬ 
sion by B • B. We then obtain 

0 g (^ • A)(B -B) - {A- Bf. 


Transposing the term -(A ■ Bf to the other side of the inequality con- 

eludes the proof. , n , n xl 

We define the noTtn, or length, of a vector A, and denote by |l-4l|, the 

number 

llAll = 


Since A-A ^ 0, we can take the square root. Furthermore, we note 

immediately that 11^11 0 if .A 5 ^ 0 . 

In terms of coordinates, we see that 


IIAII = Vui + ••• + “?, 


and therefore that when n = 2 or n = 3. this coincides with our intuitive 

notion (derived from the Pythagoras theorem) of length. 

In view of our definition, we can rewrite the inequality of Theorem 1 in 

ICA -B)! ^ Mil Mil 


by taking the square root of both sides. We shall use it in this form in 
the proof of the next theorem. 

Theorem 2 . Let A, B be vectors. Then 


\\A + B|| ^ Mil + Mil- 


Proof. Both 
suffice to prove 
words, 


sides of this inequality are positive or 0. Hence it will 
that their squares satisfy the desired inequality, in other 

(A -1- B) • (A + B) & (IMII + 1|B||)^ 


To do this, we consider 

(A + B) • (A + B) = A • A -f 2A ■ B + B ■ B. 
In view of our previous result, this satisfies the inequality 

g M11=' + 2||A|| llBlI + IIBf, 
and the right-hand side is none other than 

(Mil + MID"- 


Our theorem is proved. 



10 


VECTORS 


[I, §4] 


Theorem 2 is known as the triangle inequality. (Cf. Exercise 11.) 
Theorem 3. Let x be a number. Then 

Mil = |x| IMII 

{absolute value of x times the length of A). 

Proof. ■ By definition, we have 

Mf = M)-M), 

which is equal to 

xHA • A) 

by the properties of tlie scalar product. Taking the square root now yields 
what we want. 

We shall say that a vector 1/ is a unit vector if \\U\\ = 1. Given any 
vector /I, let a = 11^111. If a 0 then 



is a unit vector, because 

- a = 1. 
a 

We shall say that two vectors ^1, B (neither of which is 0) have the 
same direction if there is a number c > 0 such that c.4 = B. In view of 
this definition, we see that the vector 




is a unit vector in the direction of *4 (provided ,4 0). 

We mention in pa.S5ing that two vectors .4, B (neither of which is 0) 
have oppn.'^itc directions if there is a number c < 0 such that cA = B. 

Let ,1, B be two n-luples. We define the distance between A and B 
to be |j/l — !i\\ = \ (.4 — B) ■ (.1 — B). This definition coincides with 
our geometric intuition when .4. B are points in the plane. 



Length = i;.4-Bll = P-.4|! 


Figure 8 



(I, HI 


THE NORM OF A VECTOR 


11 


A 



Figurk 9 

We are also in position to justify our definition of perpendicularity. 
Given B in the plane, the condition that 

\\A + fill = U - -Bll 

(illustrated in Fig. 9b) coincides with the geometric property that A 
should be perpendicular to B. This condition is duivalcnt with 

(A + B) ■ (d + B) = (.4 - B) • (.4 - B) 

(take the s<iuare of each side), and expanding out, this equality is equiva- 
lent with 

A A^2AB-}-BB = a • a - 2A ■ B a- B ■ B. 

Making cancellations, we obtain the equivalent condition 

4AB = 0 
or 

A B = 0. 

Let X B be two vectors and B 9 ^ 0. Suppose that we can find a num¬ 
ber c such that A - cB is perpendicular to B, or in other words, 

(A - cB) • B = 0. 

We then obtain 

A • B = cB • B, 

and therefore 

A • B 

^ ~ B - B' 

Thus the number c is uniquely determined by our condition of perpendicu¬ 
larity. Conversely, for this number c, we clearly have (A - cB) • B = 0. 
We define cB to be the projection of A along B. If B is a unit vector, 

then we have simply 


c = A • B. 





12 


VECTTORS 


(I, §4] 


Our construction has an immediate interpretation in the plane, which gives 
us a geometric interpretation for the scalar product. Namely, assume 
A 0 and look at the angle Q between A and B. Then from plane 
geometry we see that 

A 


COS 6 = 


cm 

I1.4II 


or substituting the value for c obtained 
above, 

A B = IMII i|5I| COS0. 



In view of Theorem 1, we know that in n-space, the number 

A B 


11*411 ll^t! 


has absolute value ^ I. Consequently, 


-1 S 


A ■ B 


ll*4|| l!/^| 


^ 1 , 


and there exists a unique angle 6 such that 0 5 ^ ir, and such that 


cos 6 = 


.4 • B 


11*4 li 11511 


We define this angle to be the angle beiivcen A and B. 


Exercises 

1. Find the length of the vector .1 in Exercises 1 through 6 of §1. 

2. Find the length of the vector B in Exercises 1 through 6 of §1. 

3. Fiiul the projection of .1 along B in Exercises 1 through 6 of §1. 

4. Kind the projection of B along .1 in those exercises. 

5. In Exercise 6 of §3. find the projection of / along g and the projection of g 
along/, using the same <iefinition of projection that has been given in the text 
(and di»l not refer to (’(fordinatc.s). 

<1- Firnl the norm of the functions sin 3x and cos x. with respect to the scalar 
product on tlic interval ( -ir, tt) given by the integral. 

7. I'ind the norm of the constant function 1 on the interval [ —jr, tt]. 

y. P'ind the norm of the constant function I on the interval (—1, Ij. 

1). Ix^t .1].Ir be non-zero vectors which are mutually perpendicular, in 

otlier words .1, • .1, = 0 if i 7^ j. l/ot ci, . .., c, be numbers such that 

fj.-ll f • • • -f CrAr = 0. 

Show that all r, = 0. 


U. §5] 


LINES AND PLANES 


13 


10. Let A, B be two non-zero vectors in n-space. Let d be the angle between 

them. If cose = 1 , show that 4 and B have the same direction. If cose = -i, 

show that A and B have opposite direction. 

11. If .4, B are two vectors in n-space. denote by d( A, B) the distance between 
A and B, i.e. d{A, B) = 1|5 - .111. Show that d{A, B) = d(B, A), and that 
for any three vectors A, B, C we have 

d(A,B) ^ d{A,C)A-d{B,C)- 

§5. Lines and planes 

We define the parametric equation of a straight line passing through a 
point P in the direction of a vector A 0 to be 

X = P + tA, 

where t runs through all numbers. 



Suppose tliat we work in the plane, and write the coordinates of a 
point X as (r, y). Let P = {p, q) and A = (a, b). Then in terms of the 

coordinates, we can write 

I = p + /a, y = q tb. 

We can then eliminate t and obtain the usual equation relating r and y. 

For example, let P = (2. 1) and A = (-1, 5). Then the parametric 
equation of the line through P in the direction of A gives us 

X = 2 - y = 1 + 

Multiplying the first equation by 5 and adding yields 

Sr + y = 11, 

which is familiar. 

In higher-dimensional space, we cannol eliminate t in this manner, and 
thus the parametric equation is the only one available to describe a 

straight line. 



14 


VECTORS 


(I, §5) 


Z 



However, we can describe planes by an equation analogous to the single 
equation of the line. We proceed as follows. 

Let P be a point, N a vector 7 ^ 0. We define the hyperplane passing 
through P perpendicular to N to be the collection of all points X such 
tliat X -- P is perpendicular to N, thus: 

(X ~ P)-N = 0, 

which can also be written as 

XN== PX. 

We have drawn a typical situation in 3-space in Fig. 12. 

Instead of saying that N is perpendicular to the plane, one also says 
that N is normal to the plane. 

Let t be a number 7 ^ 0. Then the set of points X such that 

(X ~ P).N = 0 

coincides with the set of points X such that 


(X - P) . /AT = 0. 


Thus we may say that our plane is the plane passing through P and 
perpendicular to the line in the direction of N. To find the equation of 
the plane, we could use any vector tN (with t ?£ 0) instead of N. 

In 3-space, we get an ordinary plane. For example, let P = (2, 1, —1) 
and N — (—1, 1, 3). Then the equation of the plane passing through P 
and perpendicular to N is 

-f- i/ + 32 = -2 + 1 — 3 
or 

-X + 7 / + 32 = -4. 



(I. §5) 


LINES AND PLANES 


15 


Observe that in 2-space, with X = {x, y), we are led to the equation 
of the line in the ordinary sense. For example, the equation of the line 
passing through (4, —3) and perpendicular to (—5, 2) is 

-■ix -\-2y = -20 - e = -26. 

We are now in position to interpret the coefficients (—5, 2) of x and y 
in this equation. Tliey give rise to a vector perpendicular to the line. In 
any equation 

ax by = c 

the vector {a, b) is perpendicular to the line determined by the equation. 
Similarly, in 3-space, the vector (a, 6, c) is perpendicular to the plane de¬ 
termined by the equation 

ax by cz = d. 

Two vectors A, B are said to be parallel if there exists a number c 0 
such that cA = B. Two lines are said to be parallel if, given two distinct 
points Pi, Qi on the first line and P 2 , Q 2 on the second, the vectors 

Pi — Qi and P 2 — Q 2 

are parallel. 

Two planes are said to be parallel (in 3-space) if their normal vectors are 
parallel. They are said to be perpendicular if their normal vectors are 
perpendicular. The angle between two planes is defined to be the angle 
between their normal vectors. 

Example. Find the cosine of the angle between the planes 

2 x — y z = 0 
X + 2y — z = 1. 

This cosine is the cosine of the angle between (2, —1, 1) and (1,2, — 1) 
and is therefore eijual to — 


Exercises 

Find a parametric equation for the line passing through the following i>oints 
1. (1, 1, -1) and (-2, 1,3) 2. (-1, 5, 2) and (3, -4, 1) 

Find the etjualion of the line in 2-space, peri)endicular to A and passing 
through B, for the following values of .-! and P. 

3. A = (1,-1), B = (-5,3) 4. .1 = (-5.4), B = (3,2) 

5. Show that the lines 

3j — by = \, 2i 3i/ = 5 

are not perpendicular. 



16 


VECTORS 


II. §5 


6. Which of the following pairs of lines are perpendicular? 

(a) 3x — Sy = 1 and 2x + i/ = 2. 

(b) 2x + 7i/ = I and x — y — b. 

(c) 3x — 5 j/ = I and 5x + 3y = 7. 

(d) —X -h y = 2 and x + y = 9. 

7. Find the equation of the plane perpendicular to the given vector N and 
passing through the given i)oint P. 

(a) .V = (1,-1,3). P = (4, 2,-1) 

(b) .V = (-3, -2. 4), P = (2. TT, -5) 

(c) .Y = (-1.0,5), P = (2.3.7). 


8 . Find the equation of the plane passing through the following three points. 

(a) (2. I, 1), (3, -I, 1). (4, 1, -1) 

(b) (-2.3,-1). (2,2.3), (-4,-1,!) 

(c) (-5.-1,2), (1,2,-1), (3,-1,2). 

9. Find a vector perpendicular to (I, 2,—3) and (2,—1,3), and another 
vector perpendicular to (— 1,3, 2) and (2, I, 1). 

10. Let P be the point (1, 2. 3, 4) and Q the point (4, 3, 2, 1). Let .1 be the 
vector (!. 1, 1. I). Let L be the line |>as.sing through P and parallel to .4. 

(a) Given a point .V on the line L, compute the distance between Q and X 
(as a function of the parameter £)• 

(b) Show that there is preci.«‘ely one point A'o on the line such that this 
distance acliieves a minimum, and that this minimum is 2\/5. 

(c) Sh<]W tliat .Yo — Q i.s perpemlicular to the line. 

11. L<‘t l)e the pf)int (1, —1.3, 1) and the point (1. 1, — 1, 2). Let A be 
the vector (1, —3, 2. 1). Solve the same questions as in the preceding problem, 
except that in this case the niinimum distance is V140; 15. 


12. Find a vector jiarallel to the line of intersection of the two planes 


2 x -- y b = 1. 

13. Same que.stion for tlie planes 

2x + y + 5? = 2. 


3x + y + z = 2. 
3x — 2y + 2 = 3. 


14. Find a parametric equation for the line of intersection of the planes of 
Exercises 12 and 13. 


15. Find the cosine of the angle between the following i>lanes: 


(a) X + y -h “ 1 
X — y — z = 5 

(c) X 4- 2y — z = 1 
- X ^ 3y -b 2 = 2 


(b) 2x + 3y - 2 = 2 
X - y + 2 = 1 

(d) 2x + y -b 2 = 3 
—X - y + 2 = TT 


1(5. Let A - A ^ P • X be the equation of a plane in 3-space. Let Q be a 
point not lying in the plane. Show that there is a unique number t such that 
Q b i.\ lies in the plane (i.e. sati.''fies the equation of the plane). What is this 
value in terms of P. Q. and A’? 



(I. §6) 


THE CROSS PRODUCT 


17 


17. Let Q = (1, —1, 2), P = (1, 3, -2), and N = (1, 2, 2). Find the point 
of intersection of the line through P in the direction of N, and the plane through 
Q perpendicular to N. 

18. Let P, Q be two points and N a vector in 3-space. Let P' be the point of 
intersection of the line through P, in the direction of N, and the plane through Q, 
perpendicular to We define the di^fance from P to that plane to be the dis¬ 
tance between P and P'. Find this distance when 

P = (1, 3. 5), Q = (-L 1, 7), .V = (- 1 . 1 , - 1 ). 

19. Let P - (1, 3, 5) and .-I = (—2, 1, 1). Find the intersection of the line 
through P in the direction of A, and the plane 

2x 3y — 2 - 1. 

20. Find the distance between the point ( 1 , 1 . 2) and the plane 

3x + y — 5z = 2. 

21. Let P = (1, 3, —1) and Q = (—4. 5, 2). Determine the coordinates of 
the following points: (a) The midpoint of the line segment between P and Q. 
(b) The two points on this line segment lying one-third and two-thirds of the 
way from P to Q. 

22. If P. Q are two arbitrary points in n-space, give the general formula for 
the midpoint of the line segment between P and Q. 

§ 6 . The cross product 

This section applies only in 3-space! 

Let A = ( 01 , 02 , 03 ) and B = ((> 1 , 62 . (> 3 ) be two vectors in 3-space. 
We define their cross product 

A X B = (0263 — 0362, 0361 — 0163, 01^2 “ 02(^1)- 

We leave the following assertions as exercises: 

1. Ax B = -{B X A). 

2. A X {B A- C) = (A X B) (A X C). 

3. For any number o, we have 

(aA) X B = a(AX B) = A X (aB). 

i. iAXB)xC = (A-C)B - {B-C)A. 

5. A X B IS perpendicular to both A and B. 

6. (A X B)^ = {A • A){B B) - {A- 

From 6 and our interpretation of the dot product, we conclude that 

M X Bf = IMfllBII" - l|/l||Wcos“«, 



18 


VECTORS 


(I. §6] 


where d is the angle between A and B. Hence we obtain 

M XBf = sin^e 

or 

\\AXB\\ = MilPlitsinfll. 

This is analogous to the formula which gave us the absolute value oi A • B. 

Exercises 

Find .1 X B for the following vectors. 

1. .1 = (1. -1, 1) and B = (-2, 3, 1) 

2. .1 = (-1, 1,2) and B = (1,0,—1) 

3. .1 = (1. I, -3) and B = (-1, -2, -3) 

4. Find .1 X .4 and B X B, in Exercises 1 through 3. 

5. Let El = (1,0,0), /i 2 = (0, 1,0), and Es = (0,0, I). Find Ei X E2, 
E2 X E3, E3 X El. 

To do the rest of the exercises, wait until you have read Chapter II, §1. 

6 . If A’(0 and }'(0 arc two differentiable curves (defined for the same values 
of /), show that 

7. Show that 

~ |.Y(() X A-(i)| = -Y(() X .Y((). 



CHAPTER II 

Differentiation of Vectors 

We begin to acquire the flavour of the mixture of algebra, geometry, 
and differentiation. Each gains in appeal from being mixed with the 
other two. 

The chain rule especially leads into the classical theory of curves. As 
you will see, the chain rule in its various aspects occurs very frecjuently 
in this book, and forms almost as basic a tool as the algebra of vectors, 
with which it will in fact be intimately mixed. 


§/. Derivative 

Let I be an interval. A curve (defined on this interval) is a rule which 
to each point of / associates a vector. If X denotes a curve defined on I, 
and t i.s a point of /, then A'(/) denotes the vector associated to t by X. 
We can write 

X(t) = 

each x,(0 being a function of t. We say that this curve is differcnliable if 
each function x,(/) is a differentiable function of t. 

I'or instance, the curve 

X(0 = (cost, sin t, t) 


is a spiral. Here we have 

j-(0 = cos f, 1/(0 = sin I, 2(0 = f- 

Remark. Unless otherwise specified in what follows, we shall assume 
that the intervals of definition for curves are open. Actually, this is slightly 
unnatural in some cases, as when we wish to join two points by a curve. 
It should therefore be remarked that most of what we prove holds also 
for clo.sed or half-closed intervals, provided one makes the following con¬ 
ventions; We do not define the derivative if the interval consists only of 
one point. If the interval has more than one point, and contains an end 
point, say the left end poir>t, define the derivative of a function at that 
point to be the right derivative, or alternatively, to be the usual limit 

lim 


19 



20 


DIFFERENTIATION OF VECTORS 


(II, §1] 


taken only for those values of h such that a h lies in the interval. Then 
the usual rules for differentiation of functions are true in this greater 
generality, and thus Rules 1 through 4 below, and the chain rule of §2 
remain true also. [An example of a statement which is not always true for 
curves defined over closed intervals is given by Exercise 11(b).] 

Let us try to differentiate vectors using a Newton quotient. We consider 

X(t + h) ~ X(t) /xi(t + A) - XlCO Xn(t + h) - Xn(t) 

h ~\ h ’*■*’ A 

and see that each component is a Newton quotient for the corresponding 
coordinate. If each Xi(t) is differentiable, then each quotient 



Xi(t + A) — Xi{t) 

A 

approaches the derivative dxi/dt. For this reason, we define the derivaiive 

dX . 

— to be 
dt 

dX (dx 1 dxn\ 

~\dt ’ * * ■ ’ dt)' 

In fact, we could also say that the vector 

/dxi dxn\ 

\ di ’ ' ’ dt) 

is the limit of the Newton quotient 

X{1 + A) - X(0 
A 


as A approaches 0. Indeed, as A approaches 0, each component 

Xi{t + A) — x,(0 


approaches dXi/dl. Hence the Newton quotient approaches the vector 


/d.ri dxA 

\ d/ ’ ‘ ’ dt)' 


For example, if X{t) = (cos t, sin /, /) then 


dX 

dt 


= (—sin t, cos t, 1) 



III. 51] 


DERIVATIVE 


21 


It will also be convenient to denote 
dXIdi by X. Thus in the previous ex¬ 
ample, we would also write 

= (—sin (, cos t, 1). 

If we visualiee geometrically the mean¬ 
ing of the difference quotient 


then it is reasonable to define a tangent vector to the curve (at time t) to 
be any vector which is equal to a constant multiple of X(0, provided 
^(t) 9 ^ 0. If ^(0 = 0, then we do not define the meaning of tangent 
vector. 

We also define the velocity vector to be X. Thus the velocity vector X(0 
at a given value of t is tangent to the curve (provided it is not 0). 

In our previous example, when t = tt, the velocity vector is 

X(Tr) = (0.-1,1), 

and for t = ir/4, we get 



We define the acceleration vector to be the derivative , provided of 

at 

course that X is differentiable. We shall also denote the acceleration vector 
by X. In our example we see that 

■^(0 = (—cos/, —sin/, 0). 

Since the derivative is defined componentwise, we have the following 
rules for differentiation. 

Rule 1. Let X(t) and Y(t) be two differentiable curves {defined for the same 
values of /). Then the sum X{t) + r(/) is differentiable, and 

d(X{t) + K(/)) dX dV 

dl dt dt 

Rule 2. Let c be a number, and let X{t) be differentiable. Then cX{t) is 
differentiable, and 

djcXjt)) ^ ^ dX 
dt dt 


Xjt H- h) - Xjt) 
h 




22 


DIFFERENTIATION OF VECTORS 


in. n] 


Rule 3. Let f(t) be a differentiable Junction^ and X(t) a differentiable 
curve {defined for the same values of t). Then f{t)X{t) is differentiable, and 

djfX) „ . dX df 
dt dt ^ dt 

Rule 4. Let X{t) and K(0 be two differentiable curves {defined for the same 
values of t). Then X(0 • K(0 is a differentiable function whose derivative is 

^ ix(() • r(oi = m ■ Y(i) + x(i) ■ m- 

(This is formally analogous to the derivative of a product of functions, 
namely the first times the derivative of the second plus the second times 
the derivative of the first, except that the product is now a scalar product.) 

As an example of the proofs we shall give the third one in detail, and 
leave the others to you as exercises. 

To begin with, we make a remark concerning the product of a function 
by a vector. For each value of t, f{t) is a number and X{t) is a vector. 
Thus /(/)A’'(/) is simply equal to a number times a vector, and we have 
already seen what this means. Thus if X{t) = (ji(0. • • • »^n(0). and 
f = f{t) is a function, then by definition, 

/(().Y(() = (mxtd) . mxM). 

AVe take the derivative of each component and can apply the rule for the 
derivative of a product of functions. We obtain: 

Using the rule for the sum of two vectors, we see that the expression on 
the right is e(}ual to 



We can take / out of the vector on the left and df/dt out of the vector on 
the right to obtain 

m f + -tw, 

as desired. 

Wo define the speed of the curve A'(0 to be the length of the velocity 
vector. If we denote the speed by v{t), then by definition we have 

KO = llAXOII. 

and thus 

K0“ = A'(/)2 = X(0-A'(0. 



in, § 1 ] 


DERIVATIVE 


23 


We can also omit the i from the notation, and write 



The length of the acceleration vector is called the acceleration scalar, and 
will be denoted by a(0- Warning: a{t) is not necessarily the derivative 
of y(0. 

We define the length of a curve X between two values a, 6 of t (a ^ 6) in 
the interval of definition of the curve to be the integral 



By definition, we can rewrite this integral in the form 



When n = 2, then this is the same formula for the lengtii which we gave 
in Volume I of this course. Thus the formula in dimension n is a very 
natural generalization of the formula in dimension 2. 


Exercises 


Find the velocity vector of the followinj' curve.s. 


1 . 

3. 

5. 


(e‘, cos sin t) 2. (sin 2t, Ior (I + Op 0 - 

(cost, sin 0 -4. (cos3t. sin 3t). 

In Exercises 3 and 4, show that the velocity vector U perpendicular to the 


l)Osition vector. 

6 . In Exerci.ses 3 and 4, show that the acceleration vector is in o|)j)osite 
direction from the position vector. 

7. Let A, B be two constant vectors. What is the velocity vector of the 
curve X = .1 4- lB1 

8 . Let X{t) be a differentiable curve. A plane or line which is i)erj)endicular 
to the velocity vector .Y(0 at the point A*(/) is sai<l to be normal to the curve at 
the point t or also at the point A'(0- Find the equation of a line normal to the 
curve.s of Exercises 3 and 4 at the )»oint ir 3. 


9. Find the equation of a plane normal to the curve 


(c'p 1. t'^) 

at the point f = 1. 

10. Same question at the point I = 0. 

11. Let A'(0 be a «lifTerentiable curve defined on an open interval. I>et Q be 
a point which is not on the curve. 

(a) Write down the formula for the distance between Q and an arbitrary 
point on the curve. 



24 


DIFFERENTIATION OP VECTORS 


[11, 521 


(b) If <0 is a value of i such that the distance between Q and X(fo) is at a 
minimum, show that the vector Q — X(<o) is normal to the curve, at 
the point X(<o). [IUnt: Investigate the minimum of the square of the 

distance.) 

(c) If X(0 is the parametric equation of a straight line, show that there 
exists a unique value to such that the distance between Q and X(fo) 
is a minimum. 

12. Find the length of the spiral (cost, sini, 0 between ( * 0 and < = 1. 

13. Find the length of the spiral (cos 21, sin 2t, 3t) between « = 1 and i = 3. 

14. Assume that the differentiable curve A'{0 lies on the sphere of radius 1. 
Show that the velocity vector is perpendicular to the position vector. [Hint: 
Start from the condition A(0^ =1-1 

15. Let A be a non-zero vector, c a number, and Q a point. Let Po be the 
point of intersection of the line passing through Q, in the direction of A, and 
the plane X • A = c. Show that for all points P of the plane, we have 

iig - Poll £ lie - P||. 

[Hint: If P 5 ^ Po, consider the straight line passing through Po and P, and use 
Exercise 11(c).) 

§2. The chain rule and applications 

Let X be a vector and c a number. As a matter of notation it will be 
convenient to define Xc to be cX, in other words, we allow ourselves to 
multiply vectors by numbers on the right. If we have a curve X(0 defined 
for some interval, and a function ^(0 defined on the same interval, then 
we let 

X{t)g(l) = git)X{t). 

Let X = X(0 be a differentiable curve. 

Let / be a function defined on some interval, such that the values of / 
lie in the domain of definition of the curve X(0- Then we may form the 
composite curve A'»/. If .s is a number at which / is defined, we let the 
value of X o / at s be (X ^ f)(s) = X(/(«)). 

For example, let A''(/) = (/“, e') and let f(s) = sin s. Then 

X(/(s)) = (sin^ s, e-" “)• 

Each component of X(/(5)) becomes a function of s, just as when we 
studied the chain rule for functions. 

It is customary to keep the notation X to denote derivative with respect 
to /. Since we shall deal with other variables than t, we agree to use the 
prime ' to denote derivative. Thus if we have a differentiable curve 

y = 5’(s) = (y.W, . . ., lUs)), 



(II. 52] 


THE CHAIN RULE AND APPLICATIONS 


25 


then we shall write 

Y'(s) = {y\is),...,ryM) 

for its derivative. 

The chain rule asserts: If X is a differentiable curve andf is a differentiable 
function defined on some interval, whose values are contained in the interval 
of definition of the curve, then the composite curve X of is differentiable, and 

(X o fYis) = X'(fis))r(s). 

The expression on the right can also be written f'(s)X'{f(s)). It is the 
product of the function/' times the vector X'. 

In another notation, if we let t = f{s), then we can write the above 
formula in the form 

djX’-f) ^ ^ rff. 

ds dt ds 

The proof of the chain rule is trivial, using the chain rule for functions. 
Indeed, let y(s) = X(f(s)). Then 

Y{s) = (r,(/(s)),.. ., x„{f(s))). 

Taking the derivative term by term, we find: 

Y'M = (xU/W)/'W-x;(/(s))/'(s)). 

We can take /'(s) outside the vector, and get 

Y’M = 

which is precisely what we want. 

Let us now assume that all the functions with which we dealt above 
have second derivatives. Using the chain rule, and the rule for the deriva¬ 
tive of a product, we obtain the following two formulas: 

(1) rw = rwx'im) 

(2) K"(») = r'MX'iM) + 

Since t = /(«), we can also write these in the form; 

(1) r(8) - r(8)X{t) 

( 2 ) r'(s) = r(8)X(t) + {r(8)yx{t). 

We shall consider an important special case of these formulas. 

We have defined 

v(t) = \\xm 

to be the speed. Let us now assume that each coordinate function of I((t) 



26 


DIFFERENTIATION OF VECTORS 


m. §2] 


is continuous. In that case, we say that X(0 is continuous. Then v(t) is a 
continuous function of We shall assume throughout that ^(O ^ 0 for 
any value of / in the interval of definition of our curve. Then v(l) > 0 
for all such values of t. We let 


m 


=/ 


v{t) dl 


be a fixed indefinite integral of w(0 over our interval. (For instance, if 
a is a point of the interval, we could let 


s{t) 


-/;• 


(u) du. 


We know that any two indefinite integrals of v over the interval differ 
by a constant.) Then 

^ = K*) > 0 

for all values of t, and hence s is a strictly increasing function. Conse¬ 
quently, the inverse function exists. Call it 


We can then write 


t = f(s). 

X(t) = X{f(s)) = Vis). 




Thus we arc in the situation described above. 

We shall now give geometric interpretations for our formulas (1) and (2). 
To begin with, we know from the theory of derivatives of inverse func¬ 
tions that 

. df /ds\~^ 

= * = W • 

Hence f'{s) is always positive. This means that in the present case, K' 
and A' have the .same direction, 
furthermore, 

= i/'WMI-VWiI = 

By what wc just saw above, tliis last expression is equal to 1. Thus y'(s) 
is a vector of length 1, a unit vector, in the same direction as X(t). Thus 
the velocity vector of the curve 1' has constant length! 

In particular, we have !''(«)" — 1. Differentiating with respect to s, 
we get 

2r . = 0. 


Hence 1 '(s) is perpendicular to for each value of s. 



Ill, 52) 


THE CHAIN RULE AND APPLICATIONS 


27 


From (2), we see that the acceleration F"(s) has two components. 
First a tangential component 

r(s)x(t) 

in the direction of X(0, which involves the naive notion of scalar accelera¬ 
tion, namely the second derivative /"(s). Second, another component in 
the direction of X(0, with a coefficient 

{ru)y 

which is positive. (We assume of course that X{1) 0.) 

For a given value of I, let us assume that X{t) 7 ^ 0, X(0 ^ 0, and 
also that X(t), X{t) do not have the same direction. In the theory of 
curves, the plane spanned by X(0 and X(0 is called the osculating plane 
to the curre at point X{(). 

Example. Let X{t) = (sin /, cos /, f). Find the osculating plane to this 
curve at t = 7r/2. 

We have X(7r/2) = (0, - 1, 1) and X(7r/2) = (-1^0,0). If we let 
N = (0, 1, 1), then N is perpendicular to X(7r/2) and X(7r/2). Further¬ 
more, let P = X{Tr/2) = (1, 0, 7r/2). Then the osculating plane at P is 
the plane passing through P, perpendicular to N, and its equation is 
therefore y + z = 7r/2. 


Exercises 

1. Prove formula (2) from formula (1). 

2. Write a parametric equation of the tangent line to the given cyrveat the 

given point in each of the following ca.ses. / / 

(a) (cos 4f, .sin At, /) at the |)oinl jr/8. 

(b) (t. 2t, P) at the point (1. 2, 1). 

(c) ((■*', e-3‘,3V20 all = 1. 

(d) «, P, P) at the point (1, 1, 1). 

3. Find the length of the curves of Exercise 2 for the following intervals. 

(a) I = 0 to I = ir/8. (b) I = 1 to I = 3. (c) I = 0 to i = i. 

4. Show that the two curves (c‘, I — e"‘)and(l — 6, cos sin 0) inter- 
Bcct at the point (1, 1,0). What i.s the angle between their tangents at that point? 

5. At what point.s d«)es the curve (2<^, 1 — 3 -|- P) intersect the plane 

3i - Hy + z - 10 = 0 ? 

0. Find the ecjuation of the osculating plane of each of the curves of 
Exercise 2, at the given point. 

7. Ix?t X(0 be a differentiable curve and suppo.se that X(t) - 0 for all t 
throughout an interval. What can you say about A’(0'- Suppose A(0 pJ 0 
but X(l) = 0 throughout the interval. What can you say about A(0? 



CHAPTER III 

Functions of Several Variables 


We view functions of several variables as functions of points in space. 
This appeals to our geometric intuition, and also relates such functions 
more easily with the theory of vectors. The gradient will appear as a 
natural generalization of derivative. In this chapter we are mainly con¬ 
cerned with basic definitions and notions. We postpone the important 
theorems to the next chapter. 


§i. Graphs and level curves 

In order to conform with usual terminology, and for the sake of brevity, 
a collection of objects will simply be called a set. In this chapter, we are 
mostly concerned with sets of points in space. 

Let .S' be a set of points in n-space. A function (defined on S) is a rule 
which to each element of .S' as.sociates a number. 

In practice, we sometimes omit mentioning explicitly the set S, since 
the context usually makes it clear for which points the function is defined. 

Example 1. In 2-.space (the plane) we can define a function / by the 
rule f(x, y) = It is defined for all points (x, y) and can be 

interpreted geometrically as the square of the distance betw’een the origin 
and the point. 


Example. 2. Again in 2-.spuce, let 


/(-r, V) = 


2 2 
r - y 

x2 + y2 


he defined for all (x, y) (0,0). We do not define / at (0,0) (also 
written ()). 

Example 3. In .3-space, we can define a function / by the rule 

/(j*. y. 2 ) ^ X- - sin (xyz) + yz'-^. 


Since a point and a vector are the same thing (namely an n-tuple), we 
can think of a function above also as a function of vectors. When we do 
not want to write the coordinates, we write /(A') instead of /(xi, . . . , x„). 
.Vs with {lumbers, we call /(.Y) the raluc of / at the point (or vector) X. 

■2H 



[Ill, §11 


GRAPHS AND LEVEL CURVES 


29 


Just as with functions of one variable, one can define the graph of a 
function/ of n variables ij, .. ., to be the set of points in (n + l)-space 
of the form 

(Xi, • • • » Xni /(Xi, • • • > Xn)), 


the (xi, . . . , Xn) being in the domain of definition of/. Thus when n = 1, 
the graph of a function / is a set of points (x, /{x)). When n = 2, the 
graph of a function / is the set of points (x, y, /(x, y)). When n = 2, it 
is already difficult to draw the graph since it involves a figure in 3-space. 
However, we shall describe another means of visualizing the function. 

For each number c, the equation /(x, y) = c is the equation of a curve 
in the plane. We have considerable experience in drawing the graphs of 
such curves, and we may therefore assume that we know how to draw 
this graph in principle. This curve is called the level curve of / at c. It 
gives us the set of points (x, y) where / takes on the value c. By drawitig 
a number of such level curves, we can get a good description of the function. 

In Example 1 above, the level curves are described by equations 

= c. 


These have a solution only when c ^ 0. 
(unless c = 0 in which case the circle of 
radius 0 is simply the origin). On Fig. 1, 
we have drawn the level curves for c = 1 
and 4. 

To find the level curves in Example 2, 
we have to determine the values (x, y) 
such that 

= c(x^ + y*) 

for a given number c. This amounts to 
solving 

x^(l - c) = y'd + c). 


In that case, they are circles 

y 



Figurk 1 


If X = 0, then /(O, y) = -1- Thus on the vertical line passing through 
the origin, our function has the constant value —1. If x 0, then we 
can divide by x in the above eciuality, and we obtain (for c — 1) 



1 - c 

1 + c' 


Taking the square root, wc obtain two level lines, namely _ 

y = ax and y = -ax, where a = 



30 


FUNCTIONS OF SEVERAL VARIABLES 


(HI, §1J 


Thus the level curves are straight lines (excluding the origin). We have 
drawn some of them on Pig. 2. (The numbers indicate the value of the 
function on the corresponding line.) 

It would of course be technically much more disagreeable to draw the 
level lines in Example 3, and we shall not do so. 



We sec that the level lines are based on the same principle as the contour 
lines of a map. ICach line describes so to speak the altitude of the function. 
If the graph is interpreted as a mountainous region, then each level curve 
gives the set of points of constant altitude. In Example 1, a person wanting 
to slay at a given altitude need but walk around in circles. In Example 2, 
such a person should walk on a straight line towards or away from the 
origin. 

If we deal with a functioti of three variables, say f(x, y, z), then 
(x,y,z) = A'is a point in 3-space. In that case, the set of points satisfying 
the ecjuation 

/(■r. y,2) = c 

for some constant c is a surface. The notion analogous to that of level 
curve is that of level surface. 

In physics, a function/ might he a potential function, giving the value 
of the potential energy at each point of space. The level surfaces are then 
sf)metimos called surface.s of equipotential. The function / might also give 
a temperature distribution (i.e. its value at a point A' is the temperature 
at A). In that case, the level surfaces are called isothermal surfaces. 

ICXEIU’ISES 

.'sketch till' level linc.s for the following functions. 

1 . x'+2y- 2.y-x~ 

■I- X ~ !/~ .*). 3j- -I- 3y- 


3. ij - 3x" 
0 . xy 



ini, 52 ] 


PARTIAL DERIVATIVES 


31 


7 . (X - !)(«/ - 2 ) 


8 . (x+ l)(y+ 3) 


2 2 
»-7 + 5i 


11 . 


13. 


14 . 


xy 


X2+J,2 

4xj/(i^ - 


10. 2x — 3y 
2 

TJj 

12 


xy 


x2 + 


y') 


X2 -}- t/2 

X + 2/ 


(try polar coordinates) 


X — y 


2 , 2 
x + y 
lO. -5 

x2 — w2 


(In Exercises 11, 12, and 13, the function is not defined at (0. 0). 
defined for y = x, and in 15 it is not defined for y = x or y = 
16. (i - 1)2 + (y + 3)2 17. x2 - y2 


In 14, it is not 

—X.) 


§2. Partial derivatives 

In this section and the next, we discuss the notion of differentiability 
for functions of several variables. When we discussed the derivative of 
functions of one variable, we assumed that such a function was defined 
on an interval. We shall have to make a similar assumption in the case 
of several variables, and for this we need to introduce a new notion. 

Let P be a point in n-space, and let o be a number > 0. The set of 
points X such that 

||X - P\\ < a 

will be called the open ball of radius a and center P. The set of points X 
such that 

1|X - 711 ^ a 


will be called the closed ball 
such that 


of radius a and center 

1|X - 7^11 = a 



The set of points X 


will be called the sphere of radius a and center P. 

Thus when n = 1, we are in 1-space, and the open ball of radius a is 
the open interval centered at P. The sphere of radius a and center P 
consists only of two points. 

When n = 2, the open ball of radius a and center P is also called the 
open disc. The sphere is the circle. 

When n = 3, then our terminology coincides with the obvious inter¬ 
pretation we might want to place on the words. 

Let .S| be the sphere of radius I, centered at the origin. Let a be a 
number >0. If X is a point of the sphere Si, then aX is a point of the 
sphere of radius a, bccaiise 

llaXll = allXll = a. 



32 


FUNCTIONS OP SEVERAL VARIABLES 


nil, § 2 ) 


In this manner, we get all points of the sphere of radius a. (Proof?) Thus 
the sphere of radius a is obtained by stretching the sphere of radius 1, 
through multiplication by a. 

A similar remark applies to the open and closed balls of radius a, they 
being obtained from the open and closed balls of radius 1 through multi¬ 
plication by a. (Prove this as an exercise.) 

Let C/ be a set of points in n-space. We shall say that U is an open set 
in n-space if the following condition is satisfied: Given any point P in U, 
there exists an open ball B of radius a > 0 which is centered at P and 
such that B is contained in U. 

Example 1. In the plane, the set consisting of the first quadrant, ex¬ 
cluding the X- and y-axes, is an open set. 

The x-axis is not open in the plane (i.e. in 2-space). Given a point on 
the x-axis, we cannot find an open disc centered at the point and contained 
in the x-axis. 

On the other hand, if we view the x-axis as the set of points in 1-space, 
then it is open in 1-space. Similarly, the interval 

-1 < X < 1 

is open in 1-space, but not open in 2-space, or n-space for n > 1. 

When we defined the derivative as a limit of 

/(■r + h) - fix) 
k -’ 

we needed the function / to be defined in some open interval around the 
point X. 

Let now / be a function of n variables, defined on an open set (J. Then 
for any point -Y in [\ the function/ is also defined at all points which are 
close to A”, namely all points which are contained in an open ball centered 
at A’^ and contained in IL 

For small values of h, the point 

(.ri -j- A, xa, . .. , x„) 

is contained in such an open ball. Hence the function is defined at that 
point, and we may form the quotient 

fiXl d" h, X2t • • • , X,i) — f(X\, . . . , Xn) 

h 

If the limit exists as h tends to 0, then we call it the first partial derivative 
of / and denote it by /.),/(x,, .... x„), or Dj/fA”), or also by 



ini, j2] 


PARTIAL DERIVATIVES 


33 


Similarly, we let 


DifiX) 


dXi 


/(X|, . . . , Xj h, . . . , Xn) — fjXi, . ■ . , Jn) 
A^O ^ 


if it exists, and call it the i-th partial derivative. 

When n — 2 and we work with variables (x, y), then the first and second 
partials are also noted 


dx 


and 



A partial derivative is therefore obtained by keeping all but one variable 
fixed, and taking the ordinary derivative with respect to this one variable. 

Example 2. Let /(x, y) = x^y^. Then 

= 2xy^ and = 3x^y^. 

dx ^ dy ^ 

We observe that when the partial derivatives are defined at all points 
where the function is defined, then they are themselves functions. This is 
the reason why the notation DJ is sometimes more useful than the nota¬ 
tion dfjdxi. It allows us to write D,f{P) for any point F in the set where 
the partial is defined. There cannot be any ambiguity or confusion with 
a (meaningless) symbol i),(/(/■*)), since/(/■*) is a number. Thus DJ(F) 
means (Dif){F). It is the value of the function Z),/ at F. 

Let / be defined in an open set U and assume that the partial derivatives 
of / exist at each point X of U. The vector 

{ir, — i£)^ .£>,./(X)), 

whose components are the partial derivatives, will be called the gradient 
of / at X and will be denoted by grad / (A). One must read this 

(grad/) (A"), 


but we shall usually omit the parentheses around grad /. 

In Example 2, we see that 

grad/(A) = grad/(x, y) = (2x|/^, Sx^y^). 

Thus the gradient is a rule which to each point X associates a vector. This 
is a different kind of thing from a function, which is a rule associating a 
num6CT’ to a point. 

Using the formula for the derivative of a sum of two functions, and 
the derivative of a constant times a function, we conclude at once that 



34 


FUNCTIONS OF SEVERAL VARIABLES 


(HI, §3] 


the gradient satisfies the following properties: 

Theorem 1 . Let f, g be two functions defined on an open set U, and 
assume that their partial derivatives exist at every point of U. Let c be a 
number. Then 

grad (f + g) = grad / + grad g 
grad (cf) = c grad j. 


You should carry out the details of the proof as an exercise. 

We shall give later several geometric and physical interpretations for 
the gradient. 

Exercises 


Find the partial derivatives 


dx 


I 



and 



for the following functions/(j. y) or fix, y, z). 

1. xy + z 2. x-y^ -f I 3. sin (xy) + cos z 

4. cos (xy) 5. sin (xyz) 6. 

7. X- sin (yz) 8. xyz 9. xz + yz + xy 

10. X cos (y — 32) + arcsin (xy) 

U. Find grad / (/^) if P is the point (1, 2, 3) in E.xercises 1, 2, 6, 8, and 9. 

12. Find grail / (P) if P is the point (1, tt. it) in F.xorcisos 4. 5, 7. 

13. Find grail / (F) if/U. y- 2 ) = log (z-}-sin (y^ — x)) and F = (I,—1,1). 

14. Find the partial derivatives of x*'. 


Differentiability and gradient 

Let / he a function defined on an open set V. Let .Y be a point of U. 
For all vectors II sucli that |]//lj is small (and II 9 ^ 0), the point .Y + // 
also lie.s in the open set. However we cannot form a quotient 

fix + / /) “ f(X ) 

H 

because it is meaningless to divide by a vector. In order to define what 
we mean for a function / to be differentiable, we must therefore find a way 
which does not involve dividing by II. 

\\‘e ivconsider the case of functions of one variable. We had defined 
the derivative to be 

= um + . 

A—0 * 

A) = & - /'W. 



nil. §3] 


DIFFERENTIABILITY AND GRADIENT 


35 


Then g{x, h) is not defined when ^ = 0, but for each value of x, 

lim g{x, h) = 0. 

We can write 

fix + A) — fix) = f'ix)h + hgix, h). 

This relation has meaning so far only when h ^ 0. However, we observe 
that if we define gix, 0) to be 0, then the preceding relation is obviously 
true when k = 0 (because we just get 0 = 0). 

Furthermore, we can replace h by —h if we replace g by —g. Thus we 
have shown that if / is difTerentiable, there exists a function gix, h) such 
that 

(1) fix + h) - fix) = nx)h + \h\ gix, h) 

and such that 

Urn j(x, h) = 0. 

A -.0 

Conversely, suppose that there exists a function v=(a) and a function 
gix, h) such that 

(la) ]\mgix,h) = 0, 

A -.0 

and 

fix + h) — fix) = *pix)k + \h\ gix, h). 

We find for h 0, 

Taking the limit as h approaches 0, we observe that 

lim gix, h) = 0. 

A -.0 « 

Hence the limit of the Newton quotient exists and is equal to v’(-r)- Hence 
/ is differentiable, and its derivative/'(x) is equal to «,p(x}. 

Therefore, the existence of functions ipix) and ^(x, h) satisfying (la) 
above could have been used as the definition of difTerentiability in tlic case 
of functions of one variable. The great advantage of (1) is that no h appears 
in the denominator. It is this relation which will suggest to us how to 
define differentiability for functions of several variables, and how to prove 
the chain rule for them. 

We now consider a function of n variables. 

Let / be a function defined on an open set U. IjCi X be a point of U. 
If // = ihi, .... hn) is a vector such that |1//|| is small enough, then 



36 


FUNCTIONS OF SEVEIRAL VARIABLES 


(III, §3] 


X -{■ H will also be a point of U and so f{X + H) is defined. Note that 

X + H ~ (xi + /ii,..., x„ + hn)- 

This is the generalization of the x + A with which we dealt previously. 

The point X + is close to X and we are interested in the difference 
/(X H) — /(X), which is the difference of the value Of the function at 
X -\- H and the value of the function at X. 

We say that/is differentiable at X if the partial derivatives Z)i/(X),..., 
Z)„/(X) exist, and if there exists a function giX, H) (defined for small H) 
such that 

lim o(X, H) — 0 (also written lim g{X, H) = 0) 

H-*0 ll«ll-*o 

and 

/(X + H)- f(X) = DJiX)h, + • • ■ + Dnf{X)kn + |!/f||ff(X, H). 
With the other notation for partial derivatives, this last relation reads: 

f(X + H) - /(X) = ^ fc. + • ■ ■ + + ||ff||ff(X, H). 

We say that / is differentiable in the open set U if it is differentiable at 
every point of U, so that the above relation holds for every point X in U. 

In view of the definition of the gradient in §2, we can rewrite our funda¬ 
mental relation in the form 

(2) /(X + //) - /(X) = (grad/(X)) • H + \\H\\g{X, H). 

The term |1H||9(X', H) has an order of magnitude smaller than the previous 
term involving the dot product. This is one advantage of the present 
notation. We know how to handle the formalism of dot products and are 
accustomed to it, and its geometric interpretation. This will help us later 
in interpreting the gradient geometrically. 

For the moment, we observe that the gradient is the only vector which 
will make formula (2) valid (cf. Exercise 7). 

Formula (2) is the one which is used throughout the applications of 
differentiability. It is therefore important to know when a function is 
differentiable. The next theorem will give us a criterion which can be 
used in practice. 

Let g be a function. We shall say that g is continuous if for every point X 
such that giX) is defined, we have 

lim g{Q) = giX). 

Q^X 

In other words, as Q approaches X, g{Q) must approach g(X). 



(III. §3) 


DIFFERENTIABILITY AND GRADIENT 


37 


Theorem 2 . Let f be a function defined on some open set U. 
that its partial derivatives exist for every point in this open set, and that 
they are continuous. Then f is differentiable. 

Proof. For simplicity of notation, we shall use two variables. Thus we 
deal with a function/(r, y). We let// = (h,k). Let (jr, y) be a point in 6’, 
and take H small, H 9 ^ (0,0). We have to consider the difference 
f{X + //) — fiX), which is simply 

fix -\- h,y + k) — fix, y). 

This is equal to 

fix + h,y + k) — fix, y + fc) + /(-r, y + h) - fix, y). 

Applying the mean value theorem for functions of one variable, and 
applying the definition of partial derivatives, we see that there is a number 
8 between x and x h such that 

(3) fix h,y k) - fix, y k) = DJis, y + k)h. 

Similarly, there is a number t between y and y + A: such that 

(4) fix, y + A-) — /(-r, u) = L> 2 fic, t)k. 

We shall now analyse the expressions on the right-hand side of C(|uations 
(3) and (4). 

Let 

giiX, //) = Difis, y + k) ~ DJix, y). 

As // approaches 0, is, y + k) approaches ix, y) because s is between x 
and X h. Since £> 1 / is continuous, it follows that 

lim giiX,H) = 0. 

H-.0 

But 

Difis, y + k) =. DJix, y) + y,(.Y. //). 

Hence equation (3) can 1)0 rewritten as 

(3) fix -t" A, y + k) - fix, y + A-) = DJix, y)h 4- hgiiX, II). 

By a similar argument, we can rewrite eciuation (4) in the form 

(0) fix, y + k) - fix, y) - Ihfix, y)k + kg^iX, II) 

with some function g 2 iX, II) sucli that 

lim <j 2 iX, II) = 0. 

II-.0 



38 


FUNCTIONS OF SEVERAL VARIABLES 


[III, 531 


If we add (5) and (6), we obtain 

(7) KX + H) - KX) = Z)i/(X)A + D2S{X)k + H) + kg^iX, H). 

To prove our theorem it will therefore suffice to prove that these last two 
terms can be put in the form indicated in the statement of the theorem. 
We observe that 

+ *2 “ \\H\\ ” il^ll 

have absolute value ^ 1. We therefore multiply and divide the last two 
terms by \\H\\ and bring them in the form: 

9>(^, H) + g 2 (.X, ff)] • 

We let 

ff(X, H) = ^ g,(X, H) + H). 

Then 


lim g{X, H) = 0, 

H~»0 

and we see that the right-hand side of (7) is now in the form 

D,f(X)h + D2f{X)k + H), 

which proves our theorem. 

Remark. If we dealt with n variables, then we would consider the ex¬ 
pression for /(X -h //) — f{X) given by 

/(•Tl “{“ hi, • • • , Xn “}“ hn) f^Xi, X 2 “1“ Aj, • • • , Xn “!“ A^) 

“1“ f(.Xii X 2 ”1" A 2 , • • • » X„ “j“ An) ■“ ^2> • • • » “1“ An) 

% 

T" /(Xi, • • • , Xn —1, Xn “h ^n) » ^n)* 

We would then apply the mean value theorem at each step, take the sum, 
and argue in essentially the same way as with tw'o variables. 


Exercises 

1. Show that *2 g 2|1//1|2 if // = (A, k). 

2. Show that 

!A2 + 3fikl ^ 4II//112. 

3. Show that 



[III. §3] 


DIFFERENTIABILITY AND GRADIENT 


39 


4. If 1|//|1 ^ 1, show that 

\h^ + *3 + k^\ ^ 3||//||2. 


5. Show that 


6. L<?t 


\(h-\-k)*\ ^ I6||//l|^ 


g(h, k) = 


- A' 


be defined for {h, k) ^ (0, 0). Find 


lim gih, k), lim \lim g(h, A-)l 

A—0 ik-.oLA-.o J 

lim g{h, k), lim f lim gih, A-)l 

*-.0 fc-.oLi-.o J 

7. Let / be defined on an oi)en set U. Let P be a point of U. Assume that 
there are two vectors .1, B and two functions giili), s> 2 (//) such that 


lim gi(H) = 0 and lim g 2 UI) = 0, 

ii-.o //-.o 

and such that 

/(P+//)-/(n = A ■ HWHWgiiH) 

= P-//+ ||//||i;2(//). 

Show that A - B. (//in/; Subtract, and let // = iK for any K, I —» 0.) 

8. Let the assumptions be as iti Lxorci.se 7. Show that all i)arlial derivatives 
of / exi.st at /^ and that .1 = ^rad /(P). [Hint: Take // to be A/i., with a unit 
vector Ei.\ 

9. Let giU) - gihy, . , h^) be a polynomial, i.e. an expression of the form 

ff(//) = 

where are numbers, and the sum is taken over a finite number of n-tuph's 

(t’l, . • •, i/i) of intej'ers S 0. W<* call the coefficients of g, and abbreviate 

them by C(,). Assunu* that giO) = 0, and that s is an inte^iT > 0 suc-h that 
fi + — + I’n ^ « for all (i). Show that for any // with ||//|| ^ 1 we have 

\giH)\ 5 NM\\H\\\ 


if N is the number of tc^rms in the sum ex|in'.ssing g, and Af is a number .^ueh 
that |c(,,| ^ M for all (i). 

10. Head the [)art of .•\i)pendix 2 coneerninR o(//), and do the sUKRested 
exercises. 



CHAPTER IV 

The Chain Rule and the Gradient 


In this chapter, we prove the chain rule for functions of several variables 
and give a number of applications. Among them will be several interpre¬ 
tations for the gradient. These form one of the central points of our theory. 
They show how powerful the tools we have accumulated turn out to be. 

§i. The chain rule 

Let / be a function defined on some open set U. Let X{t) be a curve 
such that the values X(0 are contained in U. Then we can form/(X(<)), 
which is a function of t. 

As an example, take/(x, y) = e* sin (xy). Let AfO — Then 

f{X{t)) = sin (t^). 

This is a function of t in the old sense of functions of one variable. 

The chain rule tells us how to find the derivative of this function, 
provided we know the gradient of / and the derivative X. Its statement 
is as follows. 

Let f be a function which is defined and differentiable on an open set U. 
Lei X(0 be a differentiable curve {defined for some interval of numbers t) 
such that the values X{t) lie in the open set U. Then the function 

fim) 

is difereniiable {as a function of t), and 

= (grad/(.Y(())-XW. 


In the notation dX/dt, this also reads 


dKX{t)) 

dt 


grad/(X(0) 


dX 

dt 


Proof. By definition, we must investigate the quotient 

f(X{t + k)) - fjXjt)) 
h 


K = K{t, h) = X{t + h) - X{t). 

40 


Let 


[IV, §1] THE CHAIN RULE 

Then our quotient can be rewritten in the form 


41 


fiXjt) -\-K)- fiXjl)) 
h 


Using the definition of differentiability for /, we have 


and 


S{X + K) -/(X) = grad/(X)-X+ ||X11(?(X, K) 

lim g{X, K) = 0. 

ItKII—O 


Replacing K by what it stands for, namely X{f k) — X{1), and dividing 
by h, we obtain: 


fixii + = g,ad / (x(/)) • 


X(t + A) - X(/) 


9{X, K). 


As h approaches 0, the first term of the sum approaches what we want, 
namely 

grad/(X{0)-X(/). 


The second term approaches 

±||X(/)||lim giX, K), 

A-.0 

and when h approaches 0, so does K = X(< + /i) — X{0. Hence the 
second term of the sum approaches 0. This proves our cliain rule. 

Let us write out in full the chain rule in terms of components. For 
simplicity, we do it in two variables (j, y). Then 

dfiXjl)) ^ ^ ^ ^ ^ . 

dt dx dt dy dt 

This can be applied to the seemingly more general situation when i, y are 
functions of more than one variable t. Suppose for instance that 

X = u) and y = 4>{t, u) 

are differentiable functions of two variables. Let 

g{i, u) = u), 4>(t, u}). 

If we keep u fixed and lake the partial derivative of g with respect to t, 
then we can apply our chain rule, and obtain 



42 


THE CHAIN RULE AND THE GRADIENT 


[IV. §1] 


Example 1. Let /(x, y) = or* + 2xy. Let x = r cos $ and y 
Let y(r, d) be the composite function. Find dg/dd. 

We have 


= r sin 0. 


dx . . 

-= -rsmS 


and ^ = r cos 6. 

do 


Hence 


■—= (2x 4* 2y)(—r sin 6) + 2x(r cos 6). 

OQ 


If you want the answer completely in terms of r, d, you can substitute 
r cos 6 and r sin 6 for x and y respectively in this e.xpression. 


Exercises 

(.-Vll functions are assumed to be differentiable as needed.) 

1. If X = u(r, s, t) and y = i'(r, s, l) and x = f(x, y), write out the formula 


dz dz 

z- and — • 
dr dt 

2. Find the partial derivative.s with respect to x, y, s, and t for the following 
functions. 

(ii) /U. y, -") = x^ -f 3xyi — y-z, x = 2i + s, y == —t ~ z = f- + 
(b) /(x.!/) = (x-f y)/ (l — xy), x = sin 2t, y = cos (3f — s). 

3. Let/(x, f/. 2 ) - (x-+ y--f- 2 “)Find d/, dx and 5//dy, 

■1. U-t r = (j-t -f . . . + xS)''*. What isdr/dx,? 

5, If » = fix — y. y — x), show that 

da , du 

0. If u = x^Jiy X, 2 - x), show that 

du . du , du 

7. Let X = rens dandy = r sin d. Lct/(x, y) = y(r, d). Show that 


1 = (^j\\ 

\dr) \dd/ \dx) \dyj 


S. i.et <j l)v a function of r, let r = i|A'|i.and A' = (x, y, 2 ). I.^t/(A') 
Show that 


= d(0 


liv, § 2 ] 


TANGENT PLANE 


43 


9. Let j be a function of r, and r = j|X||. Let/(X) = {?(r). Finclgrad/(X) 
for the following functions. 

(a) ff(r) = 1/r (b) ff(r) = 

(c) ff(r) = l/r^ (d) g(r) = 

(e) p(r) = log- (f) 9 (r) = 4/r" (m integer 1.) 

r 

10. Let I — u cos 0 — t» sin 0, and t/ — u sin 6 -j- v cos 0, with 0 equal to a 
constant. Let /(x, y) = g(u, f). Show that 

(i)’-($)’-(a’+(g)’ 

11. I>et/be a differentiable function (in two variables) such that grad / (.Y) = 
eX for some constant c and all -Y in 2-.space. Show that / is constant on any 
circle of radius a > 0. [Hint: Put j — a cos ( and y - a sin t and find df/dt.] 

12. Generalize to the case of n variables. |You may assume that any two 
points on the sphere of radius a are connected by a differentiable curve A’(0.1 

13. Let r = ||iYl|. Ixt g be a differentiable function of one vaiial)le whose 
derivative is never equal to 0. Let/(.V) *= g{r). Show that grad/(A) is parallel 
to Y for X 5^ 0. 

14. Let/ be a function whicli i.s differentiable at all points X ^ 0 in «-space. 
Assume that there exists an integer m S 1 sueh that /(IX) = P/fX) for all 
numbers t ^ 0 and all points X 7^ 0. Prove Luler’s relation: 

OXl din 

which can also be written X • grad / (X) = n«/(A'). 

15. Reconsider Exercise 6 in the light of Exerci.se 14. Generalize. 

§2. Tangent plane 

Let / bo a differentiable function and c a number. The set of points A' 
such that /(X) = c and grad / (X) 5«^ 0 is called a surface. 

Let X(/) be a differentiable curve, y Wo shall say that the curve lies on 
the surface if, for all I, we have 

liXiD) = c. 

This simply mean.s that all the points of the curve satisfy the equation 
of the surface. If we differentiate this relation, we got from tlie chain 
rule: 

grad/(X(0)-X(0 = 0. 

Let P be a point of the surface, and let X(/) be a curve on the surface 
passing through P. This means that there is a number such that 



44 


THE CHAIN RULE AND THE GRADIENT 


[IV, 521 


-V(/o) = For this value io, we obtain 

grad/(P).X(;o) = 0. 

Thus the gradient of / at P is perpendicular to the tangent vector of the 
curve at P. [We assume that X(to) ^ 0.] This is true for ani/differentiable 
curve passing through P. It is therefore very reasonable to define the plane 
(or hyperplane) tangent to the surface at P to be the plane passing through 
P and perpendicular to the vector grad/(P). (We know from Chapter I 
how to find such planes.) This definition applies only when grad / (P) 0. 

If grad/(P) = 0, then we do not define the notion of tangent plane. 

The fact that grad / (P) is perpendicular to every curve passing through 
P on the surface also gives us an interpretation of the gradient as being 
perpendicular to the surface 

KX) = C 

(which is one of the level surfaces for the function /). 

Example 1. Find the tangent plane to the surface 

X- + y- + = 3 

at the point (1,1, 1). 

Let f{X) = f y* + z^. Then at the point P = (1, 1, 1), 

grad/(P) = (2, 2,2). 

The equation of a plane passing through P and perpendicular to a vector N 
is 

X-N = P- N. 

In the present ca.se. this yields 

2x + 2y + 2z = 2 + 2 + 2 = 6. 

Observe that our arguments also give us a means of finding a vector 
perpendicular to a curve in 2-space at a given point, simply by applying 
the preceding discussion to the plane instead of 3-space. 

Example 2. Find the tangent line to the curve 

x^y + y^ = 10 

at the point (I, 2), and find a vector perpendicular to the curve at that 
point. 

Ijet f{x, y) = x^y -f y^. The gradient at the given point P is easily 
computed, and we find 


grad/(P) = (4, 13). 


(IV, 13) 


DIRECTIONAL DERIVATIVE 


45 


This is a vector perpendicular to the curve at the given point. The tangent 
line is also given by X • N = P • N, and thus is 

4x -}- 13</ = 4 + 2G = 30. 


Exercises 

1. Find the equation of the tangent plane and normal line to each of the 
following surfaces at the specific point. 

(a) 4 - y 2 4 . ^2 = 49 at ( 6 . 2, 3) 

{h) x\j yz -\- zz ~ \ =0 at ( 1 , 1 , 0 ) 

(c) + xyM- + z + 1 = 0 at (2. -3, 4) 

(d) 2i/ — — 3 x 2 * 0 at (1, 7, 2) 

(e) x^y^-\- xz - 2 y 3 = lO at (2, 1.4) 

(f) sin xy + sin t/z + sin x 2 = I at (1, 7 r/ 2 , 0) 

2. Let Six, y,z) = z — e' sin y, and P = (log 3, 37 r/ 2 , —3). Find: 

(a) grad/(F), 

(b) the normal line at F to the level .surface for / which pas.ses through F, 

(c) the tangent plane to thi.s surface at F. 

3. Find the parametric equation of the tangent line to the curve of inter¬ 
section of the following surfaces at the indicated point. 

(a) + y2 + 22 = 49 and = 13 at (3, 2, - 6 ) 

(b) xy -|- 2 = 0 and -p y^ + z"^ = 9 at (2, 1, —2) 

(c) x 2 — 1/2 _ ^2 _ I auJ x2 — y2 + 2 ^ = 9 at (3, 2, 2) 

(iVoIe. The tangent line above may be defined to be the line of intersection of 
the tangent planes of the given point.) 

4. Let SiX) ~ 0 be a differentiable surface. L<*t (? be a point which does not 
lie on the surface. Given a differentiable curve X(l) on the surface, defined on 
an ojien interval, give the formula for the di-'^tanee between Q and a |)oint .V(0. 
Assume that this di.stance reaches a minimum for t = Iq. Lot F = .Y{to). 
Show that the line joining Q to F is perpendicular to the curve at F. 

§ 5 . Directional derivative 

/ be defined on an open set and as.sume that / i.s differentiable. Let 
F be a point of the open set, and lot be a unit vector (i.e. |1.1|] = 1). 
Then F + M is the parametric equation of a straight line in the direction 
of A and pas.sing through F. We observe that 

d(P + tA) 


Hence by the chain rule, if we take the derivative of the function 



46 


THE CHAIN RULE AND THE GRADIENT 


(IV, §31 


f(P + tA), which is defined for small values of t, we obtain 

When i is equal to 0, this derivative is equal to 

grad / (P) • A. 


For obvious geometrical reasons, we call it the direciional derivative of / 
in the direction of A. We interpret it as the rate of change of / along the 
straight line in the direction of i4, at the point P. 

Example. Let f{x, y) = and let B = (1, 2). Find the direc¬ 

tional derivative of / in the direction of B, at the point ( — 1, 3). 

We note that B is not a unit vector. Its length is y/h. Let 



Then Also, unit vector having the same direction as B. Let P = (—1, 3). 
Then grad/(P) = ( — 2,27). Hence by our formula, the directional 
derivative is equal to: 


grad/(P) ■ A = ~ (-2 + 54) 

v5 


52 

V5' 


Consider again a differentiable function / on an open set U. 

Let P be a point of U. Let us assume that grad / (P) ^ 0, and let A 
be a unit vector. We know that 


grad/{P)-A = ||grad/{P)i| |l/l||cos0, 

where 0 is the angle between grad/ (P) and A. Since [I.4|| = 1, we see 
that tlie directional derivative is equal to 

!grad/(P)!l cos 6. 

The value of cos 6 varies between — 1 and +1 when we select all possible 
unit vectors --1. 

The maximal value of cos 6 is obtained when we select A such that 
0 = 0, i.e. when we select .1 to have the same direction as grad/(P). 
Li that case, the directional derivative is equal to the length of the 
gradient [cf. Exercise 10 of Chapter I, §4}. 

Thus wc have obtained another interpretation for the gradient: Its 
direction is that of jnaximal increase of the function, and its length is the rate 
of increase of the function in that direction. 



(IV, §4) 


CONSERVATION LAW 


47 


The directional derivative in the direction of ^ is a minimum when 
cos 6 = —1. This is the case when we select A to have opposite direction 
to grad/(P). That direction is therefore the direction of maximal de¬ 
crease of the function. 

For example, / might represent a temperature distribution in space. At 
any point P, a particle which feels cold and wants to become warmer 
fastest should move in the direction of grad / (P). Another particle which 
is warm and wants to cool down fastest should move in the direction of 
-grad/(P). 


Exercises 

1. In Exercise 2 of the preceding section, hnd; 

(a) The directional derivative of / at P in the direction of (1, 2, 2). . 

(b) The maximum and minimum values for the directional derivatives of 
/at P. 

2. Find the directional derivatives of the following functions at the specified 
points in the specified directions. 

(a) log (x^ -f at (1, 1), direction (2, 1). 

(b) xy + yz-\- zx at (—1, 1,7), direction (3, 4, —12). 

(c) 4x^ at (2, 1) in the direction of maximum directional derivative. 

3. .\ temperature distribution in space is given by the function /(x, y) = 
10 4-0 cos X cos y-h 3 cos 2x-t-4 cos 3i/. At the point (jr/3, ip/ 3). find the 
direction of greatest increase of tem[)erature, and the direction of greatest 
decrease of temperature. 

4. In what direction are the following functions of X increasing most rapidly 
at the given jjoint? 

(a) x/|lA-p^=^ at (1,-1.2) {,Y = {x,y,z)) 

(b) II.Ytl® at (1,2, -1, 1) (Y = {x,y,z,w)) 

§4. Conservation law 

As a final application of the chain rule, we derive the conservation law 
of physics. 

lx‘i V be an open set. By a vector field on U we mean a rule which to 
every point of V associates a vector of the same dimension. 

If / is a clifTcrontiable function on V, then we observe that grad / is a 
vector field, which as.sociatos the vector grad/(P) to the point P of V. 

A vector field in phy.sics is often interpreted as a field of forces. 

If P is a vector field on U, and A' a point of U, then we denote by P(Y) 
the vector associated to Y by P aiid call it the value of P at Y. as usual. 

If P is a vector field, and if there exists a differentiable function / such 
that P = grad/, then the vector field is called comervative. Since 



48 


THE CHAIN RULE AND THE GRADIENT 


(IV, 54] 


—grad/ = grad (—/), it does not matter whether we use / or —/ in the 
definition of conservative. 

Let us assume that F is a conservative field on U, and let <i> be a differen¬ 
tiable function such that for all points X in C/ we have 

F{X) = —grad 4». 

In physics, one interprets as a potential function. Suppose that a par¬ 
ticle of mass m moves along a differentiable curve X(t) in U, and let us 
assume that this particle obeys Newton’s law: 

F(A') = mX, i.e. F (X(/)) = mX(0 

for all t where A'(/) is defined. Then according to our hypotheses, 

mX + grad 4- (X) = O. 

Take the dot product of both sides with X. We obtain 

mX-X-f grad4>(X)-X = 0. 

But the derivative (with respect to /) of X^ is 2X • X. The derivative 
with respect to t of 4>(X{0) is equal to 

grad 4> (X) • 

by the chain rule. Hence the expression on the left of our last equation is 
the derivative of the/unc^fan 

imX2 + 4>(X), 

and that derivative is 0. Hence this function is equal to a constant. 
This is what one means by the conservation law. 

The function is called the kinetic energy, and the conservation law 

states that the sum of the kinetic and potential energies is constant. 

It is not true that all vector fields are conservative. We shall discuss 
the problem of determining which ones arc conservative in the next 
section. 

The fields of classical physics are for the most part conservative. For 
instance, consider a force which is inversely proportional to the square of 
the distance from the point to the origin, and in the direction of the 
position vector. Then there is a constant C such that for X 0 we have 

1 X 
11X||2 IIXII ’ 


F(X) = C 



liv, §4] 


CONSERVATION LAW 


49 


because 


ll^li 


is a unit vector in the direction of X. Thus 


1 


F{X) = C ^ X. 


where r = 11X|1. A potential function for F is given by 


C 

r 


Exercises 

1. Find a potential function for a force field which is inversely proportional 
to the distance from the point to the origin, and is in the direction of the position 
vector. 

2. Same question, replacing “distance" with "cube of the distance”. 



» * 

CHAPTER V 

Potential Functions and Line Integrals 

We are going to deal systematically with the possibility of finding a 
potential function for a vector field. The discussion of the existence of 
such a function will bo limited to the case of two variables. Actually, 
there is no essential difficulty in extending the results to arbitrary n-space 
but we leave this to the reader. 

The problem is one of integration, and the line integrals are a natural 
continuation of the integrals at the end of §1 (taken on vertical and 
horizontal lines). 


§7. Potential functions 

Ix't F be a vector field on an open set U. If v? is a differentiable function 
on U such that F = grad >p, then we say that ^ is a potential function 
for F. 

One can raise two questions about potential functions. Arc they unique, 
and do they exist? 

We consider the first question, and we shall be able to give a satisfactory 
answer to it. The problem is analogous to determining an integral for a 
function of one variable, up to a constant, and we shall formulate and 
prove the analogous statement in the present situation. 

We recall that even in the case of functions of one variable, it is not 
true that whenever two functions/, g arc such that 

d/ _ ^ 
dx dx ’ 

then / and g differ by a constant, unless we as.sume that /, g are defined 
on some interval. As we emphasized in the First Course, we could for 
instance take 

1 - -f 5 if X < 0, 

- — TT if X > 0, 

X 

gix) = - if X 0. 

X 


50 



(V, §1] 


POTENTIAL FUNCTIONS 


51 


Then /, g have the same derivative, but there is no constant C such that 
for all X 5*^ 0 we have /(x) = g{x) + C. 

In the case of functions of several variables, we shall have to make a 
similar restriction on the domain of definition of the functions. 

Let U be an open set and let P, Q be two points of U. We shall say 
that P, Q can be joined by a differentiable curve if there exists a differen¬ 
tiable curve XCri (with I ranging over some interval of numbers) which is 
contained in U, and two values of t, say and (2 in that interval, such 
that 

X(ti) = P and X{t 2 ) = Q. 

For example, if U is the entire plane, then any two points can be joined 
by a straight line. In fact, if P, Q are two points, then we take 

Xit) = F + - P). 

When I = 0, then X{0) = P. When t — 1, then X(l) — Q. 

It is not always the case that two points of an open set can be joined 
by a straight line. We have drawn a picture of two points P, Q in an 
open set (J which cannot be so joined. 



We are now in position to state the theorem we had in mind. 

Theorem 1 . Lei U be an open set, and assume that any two points in U 
can be joined by a differentiable curve. Let f, g be two differentiable func¬ 
tions on U. If grad f (X) = grad g (X) for every point X of V, then 
there exists a constant C such that 

f[X) = giX) + C 

for all points X of U. 

Proof. We note that grad {f — g) — grad / — grad y = 0, and we 
must prove that / — is constant. lA^tting = f — g, we sec tliat it 
Kuffice.s to prove: If grad ip (X) = 0 for every point X of U, then <p is 
constant. 

Ijct Z-' be a fixed point of U and let Q be any other point. lA't X{t) be 
a differentiable curve joining P to Q, which is contained in U, and defined 



52 


POTENTUL FUNCTIONS AND LINE INTEGRALS 


(V, §1) 


over an 
rule, 


interval. The derivative of the function v’(.X’(0) is, by the chain 

grad (X«)) ■ X(t). 


But X(0 is a point of U for all values of t in the interval. Hence by our 
assumption, the derivative of ^(X(/)) is 0 for all t in the interval. Hence 
there is a constant C such that 

AXit)) - C 

for all t in the interval. In other words, the function ip is constant on the 
curve. Hence >p{P) = <p{Q). 

This result is true for any point <3 of U. Hence <p is constant on U, as 
was to be shown. 

Our theorem proves the uniqueness of potential functions (within the 
restrictions placed by our extra hypothesis on the open set U). 

We still have the problem of determining when a vector field F admits 
a potential function. 

A complete discussion of this problem would lead us too far afield. We 
shall limit ourselves to some useful practical remarks in the case of func¬ 
tions of two variables. 

Let F be a vector field (in 2-space), so that we can write 

F{i, y) = iJ(x, y), g{x, y)) 


with functions / and g, defined over a suitable open set. We want to 
know when there exists a function p{x, y) such that 



Such a function would be a potential function for F, by definition. (We 
as.sume throughout that all hypotheses of differentiability are satisfied as 
needed.) 

Suppo.'^e that such a function p exists. Then 

5/ __ a /dA j dg d /dip\ 

dy dy \ai7 dx ~~ dx ydf// 


We shall show in the next chapter that under suitable hypotheses, the 
two partial derivatives on the right are equal. This means that if there 
exists a potential function for F, then 


(V, §1] 


POTENTIAL FUNCTIONS 


53 


This gives us a simple test in practice to tell whether a potential function 
may exist. 

Theorem 2. Letf, g be differentiable functions having continuous partial 
derivatives on an open set U in 2-space. If 

dy dx 

then the vector field F(x, y) = (fix, y), g(i, y)) does not have a potential 
function. 

It can be shown that the converse is true in some very important cases. 
We shall state a theorem which will give us conditions under which the 
converse is true. 

Theorem 3. Let f, g be differentiable functions on an open set of the 
plane. If this open set is the entire plane, or if it is an open disc, or the 
inside of a rectangle, if the partial derivatives of f, g exist and are con¬ 
tinuous, and if 

dy dx 

then the vector field F(x, y) = {J{x, y), g(x, y)) has a potential function. 

We shall indicate how a proof of Theorem 3 might go for a rectangle 
after we have discussed some examples. 

Example 1. Determine whether the vector field 

y) = (e'“, 6'+“) 

has a potential function. 

Here,/(z, y) = €*" and g{x, y) ~ c*'*'*'. We have: 

f = ic'" and ^ = e'+^ 
dy dx 

Since these are not equal, we know that there cannot be a potential 
function. 

If the partial derivatives df/dy and dg/dx turn out to be equal, then 
one can try to find a potential function by integrating with respect to 
one of the variables. Thus we try to find 

I fix, y) dx, 

keeping y constant, and taking the ordinary integral of functions of one 
variable. If we can find such an integral, it will be a function ^(z, y), 
whose partial with respect to x will be equal to fix, y) (by definition). 



54 


POTENTIAL FUNCTION’S AND LINE INTEGRALS 


[V, §1] 


Adding a function of y, wc can then adjust it so that its partial with 
respect to y is equal to ^(x, y). 

Example 2. Let F{x, y) = (2xy, 4- Determine whether this 

vector field has a potential function, and if it docs, find it. 

Applying the test which we mentioned above, wc find that a potential 
function may exist. To find it, we consider first the integral 

/2xi/ dx, 

viewing y as con.stant. We obtain x‘y for the indefinite integral. We 
must now find a function n(y) such that 

= T- -i- Si/. 

This means that we must find a function u( /) such that 


or in other words, 



This is a simple integration problem in one variable, and wc find uiy) = i/. 
Thus finally, if we let 

sr(.r, y] = .r‘ij -\- 


then wo see that y- is a potential function for F. 

The procedure wc have just applied works when the open .set C is a 
rectangle. Ibr simplicity. Id a. h be numbers > 0 and consider the rec¬ 
tangle --a < X < a and —/> < y < />, We are given two continuous 
fiincti<Mis /i.r. ;/) and y(x. y) in this rectangle. We a.s.sumo that their 
partial d<Tivativos exist, are continuous, and that 


fV ^ . 

dij di¬ 
ll will he convenient to use also th(“ I) notation, so that this relation reads 



We wisli to hml a potential function for F. Let 

^(x, y) = r/(/, y) (it. 

Ji) 

In this integral, we regard y as con.stant. By definition, wc obtain 


Di^(x, y) = fix, y). 


[V, §11 


POTENTIAL FUNCTIONS 


55 


What is £> 2 ^? It can be shown that £>2 can be moved inside the integral 
sign. (We do not do it here because the proof has too many < and d in it.) 
Thus we obtain 

Dii'ix, y) = r D2fit. y) dt = f" D.git, y) dt. 

Jo Jo 


Since the integral of the derivative is equal to the function, this last 
expression gives 


D 2 ^ix,y) = git,y)\ = g{x, y) — g{0, y). 

0 


Now let 


«(!/) = f " !?(0. 0 dt, 

Jo 


so that the derivative of u{i/) is ^(O, y). Finally, let 


Vj(j, y) = y) + u(y). 

We contend that is the desired potential function. 

Taking the partial with respect to x kills the term u{ij). Hence 

DMx,y) = Di^{x,y) =f(x,y). 

As for the second partial, 

D 2 >pix, y) = D 2 HX, y) + D 2 u{y) 

= gix, y) — ff(0, y) + g{0, y) 

= gix, y). 


This concludes our arguments that is a potential function. 


Exekcibes 

Determine which of the following vector fields admit |>otcntial functions. 

1. (e*. sinij/) 2. (2x-‘y,y^) 

3. i2xy, y^) 4. x + y^) 

Find potential functions for the following vector fields. 

5. (a) FiX) =-X (b) FiX) = 1 X 

r r‘ 

(c) FiX) = (if n is an integer 9 ^ —2). (In this E.xercise, r = ||A'|1, 
and X 9 ^ 0.) 

6. i4xy, 2x2) 

8, i^x^y^, 2 x^y) 

10. (i/c'^ xe'*-) 


7. (xy co.s ly -f sin xy, x^ cos xy) 
0. (2x, 4y^} 



56 


POTENTIAL FUNCTIONS AND LINE INTEGRALS 


[V. §2) 


11. Let r = Let be a differentiable function of one variable. Show 

that the vector field defined by 





in the domain A’ ^ 0 always admits a potential function. What is this potential 
function? 

12. Generalize Theorem 3 to functions of three variables, indicating how a 
proof might go along the same lines as the proof we sketched for two variables. 


§2. Line integrals 

Let U be an open set (of n-space), and let F be a vector field on U. 
We can represent F by components: 

F(X) = (/,(A-). ]„{X)), 

each /, being a function. When n — 2, 

A{.Y) = {J(x, y), gU. y))- 

If each function /i(A'),. . ., /n(A') is continuous, then we shall say that 
F is a continuous vector field. If each function /i(A'),. . . ,fn(^) is differ¬ 
entiable. then we shall say that F is a differentiable vector field. 

We shall also deal with curves. Rather than use the letter X to denote 
a curve, we shall use another letter, for instance C, to avoid certain con¬ 
fusions which might arise in the present context. Furthermore, it is now 
cr)nvenient to assume that our curve C is defined on a closed interval 
/ = la, /)), with a < b. For each number I in /, the value C(t) is a point 
in n-space. We shall say that the curve C lies in U if C(t) is a point of 
(' for all (ill I. We say that C is continuously differentiable if its derivative 
('(0 = dC/dt exists and is continuous. 

Let F he a continuous vector field on V, and let C be a continuously 
dilTerentiable curve in V. The dot product 

F(m) ■ ~ 

is a function 4 )f /, and it can be shown easily that this function is con¬ 
tinuous (by € and 6 techniques which we always omit). We suppose that 
C is defined on the interval [a, 6 ). We define the integral of F along C to be 

L ^ = I! • f 

This integral is a direct generalization of the familiar notion of integral of 



IV, §21 


LINE INTEGRALS 


57 


functions of one variable. If we are given a function /(u), and u is a func¬ 
tion of t, then 


ru{h) 

J u(a) 


/(u) du = 




(This is the formula describing the substitution method for evaluating 
integrals.) 

In n-space, C(a) and C(6) arc points, and our curve passes through 
these two points. Thus the integral we have written down can be inter¬ 
preted as an integral of the vector field, along the curve, between the two 
points. It will be convenient to write the integral in the form 



to denote the integral along the curve C, from P to Q. 

Example. Let F{x, y) = {x^y, y^). Find the integral of F along the 
straight line from the origin to the point (I, 1). 

We can parametrize the line in the form 


Thus 

Furthermore, 


Hence 


C{t) = it, t). 
F{C(t)) = 



The integral we must find is therefore equal to; 



Remark. If we are given a finite number of continuously differentiable 
curves forming a path as indicated on the following figure: 




58 


POTENTIAL FUNCTIONS AND LINE INTEGRALS 


[V, §2) 


then the integral over the path is simply the sum of the integrals over 
each segment. Such a path is called piecemise continuously differentiable. 

Thus a piecewise continuously differentiable curve C consists of a 
sequence where each C, is a continuously differentiable 

curve, defined on an interval (a,, 6 ,], such that the end point of Ci is the 
beginning point of C,+i, in other words 

Ci{bi) = Ci+ifa.+i). 

We define the integral ,of F along such a curve C to be the sum 



We say that our curve C is a closed curve if the end point of Cm is the 
beginning point of C'l. 

In the following picture, we have drawn a closed curve such that the 
beginning point of Ci, namely Pi, is the end point of the curve C 4 , which 
joins P 4 to Pi. 



Kirially, we observe that in physics, one may interpret a vector field 
/•’(.V) as describing a force. Then the integral of tliis vector field along a 
cur\e (' describes the work done by the force along this curve. 

Kxerci.ses 

('Dtiipui*' the line integrals of the vector field over the indicated curves. 

1. Fix. ij) * (x- -- 2xy, ij- — 2jj/) along the parabola 1 / = x'from (—2, 4) 
to (1. I). 

2. (x, »/. x: -- ij) over the line .segment from (0, 0, 0) to (I, 2, 4). 

3. I.et r = {x“ r Let /^(-V) = r“*.V. Find the integral of/’(.V) over 

tin- circlt- of railiu- 2. taknii in counterclockwise direction. 

•}. I.et r be a j'ircle of ra<lius 20 with center at the origin. I>et F{X) be a 
vector field such that F[X) has the .same direction as A'. What is the integral 
of F around C? 



IV, §2) 


LINE INTEGRALS 


59 


5. What is the work done by the force Fix. y) = (-r^ - y^. 2xy) by moving 
a particle of mass m along the square bounded by the coordinate axes and the 
lines j * 3, y = 3 in counterclockwise direction? 

6 . Let Fix, y) = (cjy, where c is a positive constant. Let a, 6 be num¬ 

bers > 0. Find a value of a in terms of f such that the line integral of /•’ along 
the curve y = ax^ from (0, 0) to the line j = I is independent of b. 

Find the values of the indicated integrals of vector fields along the given 
curves. 

7 . (y2j — x) along the parabola x = y“/-4 from (0,0) to (1, 2). 

3 j) along the arc in the first quadrant of the circle x^ + y^ = -i 

from (0, 2) to (2, 0). 

9 . (x^y^, xy^) along the closed curve forme<l by parts of the line x = 1 and 
the parabola y'^ = x, counterclockwise. 

jQ (^2 _ J.J counterclockwise around the circle x^ y^ = 4. 

11. The vector field 

-y _£_\ 

x2 + y2 ’ x2 + y^) 

counterclockwise along the circle x* + y^ = 2 from (1. 1) to { —V2, 0). 

12. The same vector field along the line x + y = 1 from (0. 1) to (I, 0). 

13. (2xy, —3xy) clockwise around the square bounded by the lines x = 3, 
I = 5, y = 1, y = 3. 

14. Let F be a continuous vector field on an open set Suppose that 
F = grad v> for some differentiable function on U. Prove that the line integral 
of F around any closed continuously differentiable curve in V is equal to 0. 
Prove the same conclusion if the curve is assumed to be only piecewise con¬ 
tinuously differentiable. 

15. Conversely, let F be a continuous vector field on an open .set V. Assume 
that any two points of V can be joined by a piecewise continuously differentiable 
curve, and a.ssume that the integral of F along any closed piecewise continuously 
differentiable curve in V is equal toO. Prove that F i.s the gradient of some func¬ 
tion on V. \IIinl: Let Po be a fixed point of U, and for any point P of C define 
(fiiP) to be the value of the integral 

F 

taken along any piecewise continuously differentiable curve between Po and P. 
Show that >fi has partial derivatives, and that its gradient is F.\ 




CHAPTER VI 


Taylor’s Formula 

In this chapter, \vc discuss two things which are of independent interest. 
First, we define partial differential operators (with constant coefficients). 
It is very useful to have facility in working with these formally. 

Secondly, we apply them to the derivation of Taylor's formula for 
functions of several variables, which will be very similar to the formula 
for one variable. The formula, as before, tells us how to approximate a 
function by means of polynomials. In the present theory, these poly¬ 
nomials involve several variables, of course. We shall see that they are 
hardly more difficult to handle than polynomials in one variable in the 
muttons under con.'^ideration. 

"i'he proof that the partial derivatives commute is tricky. It can be 
omitted without harm in a class allergic to theory, because the technique 
involved never reappears in the rest of this book. 


§1. Repeated partial derivatives 

Let / be a function of two variables, defined on an open set U in 2-space. 
A.ssume that its first partial derivative exists. Then Dif (which we also 
write df, d.c if x is the first variable) is a function defined on U. We may 
then ask for its first or second partial derivative, i.c. we may form D^Dif 
or DiDiJ if tlieso exist. Similarly, if I)>f exi.sts, and if the first partial 
derivative of I),f exists, we may form D^DJ. 

Suppose that we write / in terms of the two variables (/, y). Then we 
can write 

D^IhUx ,;/) (D,(D,j))(x, ,j), 

and 

y) = ^ (fj) = (02(/Ji/))(x, y). 

I'or example, let/(j', y) = sin (jy). Then 


% = U cos (jry) 
o.r 


and 


f- = X cos (xy). 

oy 


Hence 


l^ 2 Dif(r,y) = —xysin (xy) -} cos(.ry). 



(VI, §1) 


REPEATED PARTIAL DERIVATIVES 


61 


But differentiating djidrj with respect to x, we see that 

y) = —^y sin {xy) + cos (xi/). 

These two repeated partial derivatives are equal! 

The next theorem tells us that in practice, this will always happen. 

Theorem 1. Let fbea function of two variables, defined on an open set V 
of 2-space. Assume that the partial derivatives Dif, D 2 f, D\D 2 f, and 
D 2 Dif exist and are continuous. Then 


DxD2f — D2Dxf. 


Proof. A direct use of the dehnition of these partial and repeated 
partial derivatives would lead to a blind alley. Hence we shall have to 
use a special trick to pull through. 

Let (x, y) be a point in U, and let H = (h, k) be small, h 0, k 0. 
We consider the expression 

g{x) = fix, y + k) - fix, y). 

If we apply the mean value theorem to g, then we conclude that there 
exists a number 8 | between x and x + h such that 

gix + h) — gix) = g'isi)h, 

or in other words, using the definition of partial derivative: 

(1) gix h) — gix) = [DJisi, y -h k) — DJisi, y)]h. 

But the difference on the left of this equation is 

(2) fix h,y k) — fix + A, j/) — fix, i/ + fc) + fix, y). 

On the other liand, we can now apply the mean value theorem to the 
expression in brackets in (1) with respect to the second variable. If we do 
this, we see that the long expression in ( 2 ) is equal to 

(•^) DiDJisi, 82 )kh 

for some number lying between y and y + k. 

We now start all over again, and consider the expression 

g 2 iy) = fix + A,!/) - fix, y). 

We apply the mean value theorem to g 2 , and conclude that there is a 



62 


Taylor’s formula 


[VI, 51] 


number (2 between y and y k such that 

g2{y + fc) — 92{y) = 

or in other words, is equal to 

(4) [D 2 KX + k, /a) - D 2 f{x, t 2 )]k. 

If you work out g 2 (y + k) - g 2 {y), you will see that it is equal to the 
long expression of (2). Furthermore, proceeding as before, and applying 
the mean value theorem to the 6rst variable in (4), we sec that (4) becomes 


(5) DiD 2 f(ti,t 2 )hk 

for some number ti between x and x h. Since (5) and (3) are both 
equal to the long expression in (2), they are equal to each other. Thus 
finally wo obtain 

D2Dif(si, S2)kh — DiD2f{ti, i2)kk. 

Since we assume from the beginning that h 7 ^ 0 and /: 7 ^ 0, we can cancel 
hk, and get 

DaDifisi, S 2 ) = DiD2f{ti,t2)- 

Now as h, k approacli 0. the left side of this equation approaches DoDifiXyij) 
because i.s assumed to be continuous. Similarly, the right-hand side 

approaches DiD^Ax, y). We can therefore conclude that 


l>\k>2f{x,y) = D2DJ{x,y), 

as dc'sired. 

('onsider now a function of three variables/(.r, y, z). We can then take 
three kinds of partial derivatives: /)i, D 2 , or (in other notation, 

d/dx, d/dy, and d/dz). Let us assume throughout that all the partial 
derivatives which we shall consider exist and are continuous, so that we 
may form as many repeated partial di'rivalives as we please. Then using 
Tlieorem 1. we can show that it does not matter in which order we take 
these pnrtials. 

I-'or instance, we sec that 

IhDJ = 

'I'liis is simply an application of Theorem 1, keeping the second variable 
fixed. We may take a furtlier partial derivative, for instance 


DiD^Dif. 


Here Di occurs twice and Dz once, 'fhen this expression will be equal to 
any other repeated partial derivative of / in wliich Di occurs twice and 
Dz once. For example, wc apply the theorem to the function {DJ). 


(VI, §1] 


REPEATED PARTIAL DERIVATIVES 


63 


Then the theorem allows us to interchange Di and in front of (DJ) 
(always assuming that all partials we want to take exist and arc con¬ 
tinuous). We obtain 

DMDJ) - D^DdDJ). 

As another example, consider 
(G) 

We wish to show that it is equal to DiD2D2Dzf. By Theorem 1, we have 
D2D2J = D^Dsf. Hence; 

(7) D^DdD^D-J) = D-^DdD-M). 

We then apply Theorem I again, and interchange D2 and Di to obtain 
the desired expression. 

In general, suppose that we are given three po.sitive integers m,, »i2, 
and m3. We wish to take the repeated partial derivatives of / by u.sing 
mi times the first partial Di, using m^ times the second partial D2, and 
using m3 times the third partial 1)3. Then it does not matter in which 
order we take these partial derivatives, we shall always get the same 
answer. 

To see this, note that by repeated application of Theorem 1, we can 
always interchange any occurrence of D3 with 1)2 or Di .so as to pu.di D3 
towards the right. We can perform such interchanges until all occurrences 
of 7)3 occur furthest to the right, in the same way as we puslied D3 
towards the right going from expres.sion (G) to expression (7). Once this 
is done, we .start interchanging JJ2 with Di until all occurrences of 1)2 
pile up just behind D3. Once this is done, we are left with Di repeated a 
certain number of times on the left. 

No matter with what arrangement of Du D2, D3 we .started, wc end up 
with the same arrangement, namely 

Dt • • ■ Di D2 • ■ • D2 D3 ■ * • D3 f, 

' -V-' --v-' '■- V -' 

77 li 77I2 

with Di occurring mi times, D2 occurring m2 times, and D3 occurring 
m3 times. 

Exactly the same argument works for functions of more variables. 

Exercises 

Find the partial derivatives of order 2 for the following funetions and verify 
explicitly in each case that iJiUif = I)iD\{- 

1. 2. sin {xy) 

3. x'^t/ + ‘ixy 4. 2 iy + y'^ 



64 


Taylor’s formula 


[VI, §2] 


5. 6. sin (x^H- y) 

7 . cos (x^ + xy) 8. arctan (x^ — 2 xy) 

9. e*+‘' 10* sin (x + y). 

Find D1D2D3/ and D3D2D1/ in the following cases. 

11 . xyz 12. xV 

13 . c*‘'* 14 . sin (x2/2) 

15 . cos (x + y + 2) 16 . sin (x -f- y + 2) 

17 . (x^ + + 2^)”* 18 . i^y^2 + 2(x + y + 2). 

19 . Letx = r cos dandy = rsind. Let/(x, y) = g{r,B). Show that 

d ^ d sin d d 

dx dr r dd 

d . A ^ . cos d d 

— = sin d r- H-rr » 

dy dr r dd 

the partials on the left being viewed as acting on f, and those on the right as 
acting on g. 

20 . Letx = r cos dandy = rsind. Let/{x, y) = y(r, d). Show that 


dr2 r dr r2 dd^ 


dV . ^ 

di2 dy2 


21 . Let/(X) = y(r) (with r = ||X 11 ), and assume X = (x, y, z). Show that 

dr2 r dr dx2 ' dy2 

22 . Let fix, y) satisfy/((x, ty) = y) for all ((n being some integer ^ 1 ). 
Show that 

// ^ 


23 . Let / be as in Exercise 22 . Show that 

(It is understood throughout that all functions are as many times differentiable 
as is necessary.) 


§ 2 . Partial differential operators 

We shall continue the discussion at the end of the last section, but we 
shall build up a convenient system to talk about iterated partial derivatives. 


IVI. §2) 


PARTIAL DIFFERENTUL OPERATORS 


65 


For simplicity, let us begin with functions of one variable x. We can 
then take only one type of derivative, 



Let / be a function of one variable, and let us assume that all the iterated 
derivatives of / exist. Let m be a positive integer. Then we can take the 
m-th derivative of /, which we once denoted by We now write it 



the derivative D (or d/dx) being iterated m times. What matters here is 
the number of times D occurs. We shall use the notation Z)”* or {d/dx)”* 
to mean the iteration of D, m times. Thus we write 

D-f or (£)"■/ 

instead of the above expressions. This is shorter. But even better, we 
have the rule 

for any positive integers m, n. So this iteration of derivatives begins to 
look like a multiplication. Furthermore, if we define D^f to be simply /, 
then the rule above also holds if m, n are ^ 0. 

The expression D" will be called a simple differential operator of order m 
(in one variable, so far). 

Let us now look at the case of two variables, say (x, y). We can then 
take two partials Di and D2 (or d/dx and d/dy). Let mj, be two 
integers S 0 . Instead of writing 



we shall write 

D7‘DVf or 

For instance, taking mi = 2 and m2 = 5 we would write 

This means: take the first partial twice and the second partial five times 
(in any order). (We assume throughout that all repeated partials exist 
and are continuous.) 



66 


Taylor’s formula 


[VI, §2] 


An expression of type 




will be called a simple differentia! operator, and we shall say that its order 
is nil m2- In the example we just gave, the order is o + 2 = 7 . 

It is now clear how to proceed with three or more variables, and it is 
no harder to express our thoughts in terms of n variables than in terms of 
three. Conserjuently. if we deal with functions of n variables, all of whose 
repeated partial derivatives exist and are continuous in some open set U, 
and if Dj, . . . , Dn denote the partial derivatives with respect to these 
variables, then we call an expression 



a simple differential operator, W|. being integers ^ 0 . We say 

that its order is w| -f • • • -1- w„. 

(liven a function / (sati.'^fying the above stated condition.s), and a simple 
dilTerential operator D, we write I)f to mean the function obtained from/ 
by applying repeatedly the partial derivatives D \,. . . , Dn, the number 
of times being the number of times each D, occurs in D. 


K sample 1. 


Consider functions of three variables (j, y, z). Then 



is a simple dilTerential operator of order d 5 -f 2 = 10 . I>et / be a 
function of tlirec variables satisfying the usual hypotheses. To take Df 
means tliat we take the partial derivative with respect to z twice, the 
pai'tia! with respect to y five times, and the partial with respect to x three 
limes. 

\\V ol)serve that a simple ditTerential operator gives us a rule which to 
each function / a>sociates another function I>f. 

a matter of notation, referring to Example 1 , otie would also write 
the differential operator 1 ) in the ff>rm 

^ 11 ) 

d.r’’ dy-' dz- 


In this notation, one would tlms have 


(VI, §2] 


PARTIAL DIFFERENTIAL OPERATORS 


67 


All the above notations are used in the scientific literature, and this is the 
reason for including them here. 

Warning, Do not confuse the two expressions 



which are usually not equal. For instance, if f{x, ij) = x^y, then 

11 = 2y and g) = 

We shall now show how one can add simple differential operators and 
multiply them by constants. 

Let D, D' be two simple differential operators. For any function / we 
define (/) + D')f to be Df + D'f. If c is a number, then we define (cD)/ 
to be c( 0 /}. In this manner, taking iterated sums, and products with 
constants, we obtain what we shall call differential operators. Thus a 
differential operator Z) is a sum of terms of type 

cOT* • • ■ 

where c is a number and mi, ,. ., m„ arc integers ^ 0. 

Example 2 . Dealing with two variables, we see that 



is a differential operator. 

I.x?t f{x, y) = sin {ly). We wish to find Df. By definition, 

= 3 i/ cos {xy) + 5 (-i/^ sin {xy)) — 7r[^(-sin (arj/))x -}- cos (xy)\. 

We see that a differential operator gives rise to a rule which allows us 
to as.sociate with each function / (satisfying the usual conditions) another 
function Df. 

Let c be a number and / a function. I.A‘t D, be any partial derivative. 
Then 

E^{cf) = cDif. 

This is simply the old property tliat the derivative of a constant times a 
function is equal to the constant times the derivative of the function. 
Iterating partial derivatives, we see that this same property applies to 
differential operators. For any differential operator D, and any number c, 
we have 

D(cf) = cDf. 



68 


Taylor’s formula 


(VI, §2] 


Furthermore, if /, g are two functions (defined on the same open set, 
and having continuous partial derivatives of all order), then for any 
partial derivative we have 

DiU -\-9) = Dij + Dig. 


Iterating the partial derivatives, we find that for any 
D, we have 

DU + 9) = Df-h Dg. 


differential operator 


Having learned how to add differential operators, we now learn how to 

multiply them. , ,.a- *• i 

J.ct D, D' be two differential operators. Then we define the differential 

operator DD' to be the one obtained by taking first />' and then D. In 

other words, if / is a function, then 


{DD')f = D{D'f). 


Example 3 . Let 



and 



Then 




+ ^ ® 



Differential operators multiply just like polynomials and numbers, and 
tlieir addition and multiplication satisfy all the rules of addition and 
multiplication of polynomials. For instance: 

If D, D' arc two differential operators, then 


DD' = D'D. 


If D, D', D" arc three differential operators, then 

D{D' + D") = DD' + DD". 

It would be tedious to list all the properties here and to give in detail 
ail the proofs (even though tliesc are quite simple). We shall therefore 
omit these proofs. The main purpose of this section is to insure that you 
develop as great a facility in adding and muttiplying differcnUal operators 
as you have in adding and multiplying numbers or polynomials. 

When a differential operator is written as a sum of terms of type 

cDT‘ • • ■ On ", 

then we shall say that it is in standard form. 


(VI, §2] 


PARTIAL DIFFERENTIAL OPERATORS 


69 


For example, 



is in standard form, but 


is not. 

Each term 

cDT • • • X)"'* 


is said to have degree Wi + • • • "b If ^ differential operator is ex¬ 
pressed as a sum of simple differential operators which all liave the same 
degree, say m, then we say that it is homogeneous of degree m. 

The differential operator of Example 2 is not homogeneous. The differ¬ 
ential operator DD' of Example 3 is homogeneou.s of degree 2. 




Exercises 


Put the following differential operators in standard form. 

1. (3Di + 2D2f 2. {Di + Z)2 + Dsf 

3. (Di - D2)(Di + D 2 ) 

5. {Di "h D 2 ) 

7. (2Di - 3D2 )(Di + D 2 ) 


0 


11 


■ ('■ h +‘ f) 


4. (D, + D 2 )" 

6. (D, + D 2 )* 

8. (D, - Dj)(D2+ 5 D 3 ) 

-■ (sV • ff 


Find the values of the differential operator of Exercise 10 applied to the 
following function.s at the given point. 

13. x'‘‘y at (0, 1) at (1, 1) 

15. sin (jTi/) at (O,*-) 10. e'*' at (0,0). 

17. Let/, g ho two functions (of two variable.s) with continuous partial doriva- 
tive.s of order ^ 2 in an open set U. Assume that 


dx 



and 


dy dx 


byt 



Show that 



70 


Taylor’s formtjla 


IVI, §3) 


18. Let/be a function of three variables, defined for X 5^ Oby/(X) = I/1|X1[. 
Show that 

+ = 0 
ax2 ^ dy2 ^ dz2 "• 

19. In Exercise 20 of the preceding section, compute 

in terms of d/dr and d/dd. Watch out! The coefficients are not constant. 


§ 5 . Taylor^s formula 

Let / be a function defined on an open set U. Let F be a point in this 
set, and let // be a non-zero vector. If f is a small number, then IH is 
small, and hence P iH will lie in U. Thus /(P -f tH) will be defined, 
and there always exists an open interval (possibly small) such that 
/(F “h iH) is defined. For i in such an interval, we can take the derivative, 
by the chain rule, as in Chapter IV: 

^ (/(P + IH)) = grad + 

We arc now interested in taking the derivative once more. To do this and 
see clearly what is happening, we need to use another notation. 

Let H = (hi, ..., h„). Instead of writing grad/ we shall write 

V/ 

where V stands for the symbol 



(This is not a differential operator in the sense of §2 because V/ is not 
a function!) 

We shall write F • V for the differential operator 




_d_ 

dx\ 



d 

dXn 


Example 1. 


lfH= (3,-l)andV = 



i/V= 3- 
dx 


d 


k 


IVI, §3] 


Taylor’s formula 



The expression giving us the derivative 

7t 

can be written as H ■ (grad / (P + iH)), or in terms of coordinates: 

h^D^KP + (//) + ■•■ + KD^jiP + tH). 

The advantage of our abbreviated notation now becomes clear. We 
recognize this expression to be none other tlian Dj{P + tH), where D is 
the differential operator // • V. In other words, our expression is equal to 

((//■V)/)(P+ ///), 

which we w'ritc also as 

{H • V)/(P + iH), 


eliminating a set of parentheses for simplicity. 

We are now in a position to give the general formula for the iterated 
derivatives of/(P -f ill). 


Theorem 2. Let r be a-positive integer. Let f be a function defined on 
an open set U, and having continuous partial derivatives of orders ^ r. 
Let P be a point of U, and H a vector. Then 

for all values of f such that P + tH lies in U. 

Proof. For r = 1, we have ju.st verified our assertion. Consider next 
r = 2. lA*t 

0= (H- V)f. 


We must find the derivative 


^ (5(P + tH)). 


As we have seen, it is equal to 

((// ■ V)(,)(P + (//), 
Substituting the value for g, we get 

((//• 7)=*/) (P+ ///). 


This proves our assertion for r = 2. 



72 


TAYUm’s FORMULA 


[VI. §3] 


We can proceed stepwise, and use a similar argument for r = 3. In gen¬ 
eral, we shall now show how to go from one step to the next. Suppose we 
have proved our result for step number s. Thus we have proved that 

{j{P + m) = {H ■ V)7(P + tH), 



To find the next derivative, we let 


y = (// • V)7. 

We must then find 

^ (g{P + IH)). 


We know that this is equal to 

{H • 7)giP + tH). 

Substituting the expression defining g, we obtain 
as desired. 

Taylor’s formula for functions of several variables is an easy application 
of Theorem 2. 

Taylor's Formula. Let f be a function defined on an open set U, 
and having continuous partial derivatives up to order s. Let P be a point 
of U, and let II be a vector. Assume that the line segment 


P + til, 0 ^ ^ 1, 


is contained in U. Let D be the differential operator H • V. Then there 
exists a number t between 0 and 1 such that 


f{P + //) = f(P) + 


Df{P) , D\f(P) 


1 ! 


+ 


OI 


+ 


, D-^f(P) . D’fiPA-rH) 
(s - 1)! s! 


Proof. I..et g(t) = fiP + tH). Then g is a difTerentiable function of t 
in the old sense of functions of one variable, and we can apply the ordinary 
Taylor formula, between t = 0 and f = I. In that case, all powers of 
(1 — 0) = 1 are e(|ual to 1. Hence Taylor’s formula in one variable 
applied to g yields: 


9(1) = 9(0) + 





(5 - 1 )! 




for some number r between 0 and 1. 

The successive derivatives of g arc given by Theorem 2. If we evaluate 
them for t = 0 in the terms up to order s — 1, and for t = t in the s-th 


[VI, §3] 


Taylor’s formula 


73 


term, then wc see that the formula of the theorem simply drops out! 
That’s the proof. 

Usually one takes P to be the origin and H = X. In the case of two 
variables, for instance, 


X-V = 



The terms of the formula up to order 2 would be: 


K^. y) = m 0) + ^ ^ (0,0) + y ^ (O, O) 



g (0, 0) + 2., (0, 0) + y^lI (0, 0) 



Writing down explicitly the terms as above is useful for computations, 
but obviously unwieldy to carry out a general discussion of the situation. 
In general, we would have 


f{x,y) = Cn-iU, y) + Rn, 

where is a polynomial in x,y of degree ^ n - 1, which diffws 

from f(x, y) by the remainder term /?„. If n - I = 3, for instance, we 

have 

GaC-r. U) = Gaf-r, «/) 

+ 0)^'+ 3D?02/(0, 0)x^y + 3D,Djm 0)xy^ + Dlf(0, 0);/"] . 

Usually, as n becomes large, the polynomials Gn-\ give increasingly good 
approximations to the function /. How good an approximation must of 
course be determined in each case, by estimating the remainder, just as 
we did for functions of one variable. 

The above polynomial Gn-i will be called the polynomial approximation 

of f up to degree n — 1. 

Example 2. Kind the polynomial approximation of the function 


fix, y) = log (1+ X + 2y) 


up to degree 2. 



74 


Taylor’s formula 


IVI, §31 


To do tills, we first compute the partiRl derivatives. They are; 


DJix, y) = 

/(0,0) = 0 

1 

z>i/(0,0) = 

1 

1 -h X + 2y 


£> 2 /( 2 :, y) = 

2 

DsfiO, 0) - 

0 

l + x-\-2y 


DlS{x, y) = 

1 

Z)?/(0. 0) = 


(1 + X + 2i/)2 ’ 

Dlfix, y) = 

4 

Dim 0) = 

_ A 

(1 + x + 2i/)2' 

% 


2 

DiD2m 0) = 

-2. 

lJlU2j{Vf\J) — 

(1 + X + 2y)2 ’ 

Hence the polynomial approximation of / up to degree 2 is 


G 2 {x, y) = x-\-2y — ^(x^ 

+ 4xy -1- 4y^). 



In some cases, one can avoid computing partial derivatives. For in¬ 
stance, let p be a function of one variable, having 3 continuous derivatives 
in some interval containing 0. Then the Taylor formula for g yields; 


g{t) = ff(0) + ^ + 






for some number t between 0 and t. 

If ip{x, y) is some function of two variables whose values are such that 
g{<p{x, y)) is defined, then we can substitute <p{x, y) for t in the expression 
above. 

For instance, let g(t) = e‘. To find c**', we do not work out the partial 
derivatives. We apply our knowledge of the Taylor series for e\ which is 

i2 

e‘= + ^ + 

We obtain therefore 

e"'" = 1 + -. 


It can be shown that the terms which we obtain by this method will be 
the same as those using the Taylor formula in several variables. 

In any case, we see that our function c**' can be approximated by poly¬ 
nomials in X and y. For instance, 

2 2 

1 + X!/ + 5^ 


is such a polynomial, and to determine the degree of accuracy of the 
approximation, we have of course to estimate the remainder term. 



IVI, §4) 


ESTIMATE FOR THE REMAINDER 


75 


Exercises 

Find the terms up to order 2 in the Taylor formula of the following functions 
(taking P = 0). 

I . sin {xij) 2. cos (ary) 3. log (1 + xy) 

4. sin(a:2+y2) 5 . 6 . cos (x^ + y) ^ 

7 . (sin x)(cos y) 8 . e'siny 9 . x+rry+ 2 y 

10 Does approach a limit as (x, y) approaches (0. 0)? If so, what 

y 

limit? 

II. Same questions for 


X 

12. Same questions for 


13. Same questions for 


14. Find the terms up to order 3 in Taylor’s formula for the function e' cos y. 

15. What is the term of degree 7 in Taylor's formula for the function 

x" - 2xy' + (x - DV'"? 

16. Show that if/(x, y, 2 ) is a polynomial in x, y. z. then it is equal to its own 
Taylor series, i.e. there exists an integer n such that = 0. 

17. Find the polynomial approximation of the function 

/(i, y) = log (1 + X + 2 y) 

up to degree 3. 


and 


logd + xV y^) 
x2+ y2 


cos (xy) — 1 


sin (xy) — ly 


%4. Estimate for the remainder 

Let U.S take P = 0 in Taylor’s formula, and H = X. Then the formula 
reads: 


f{X) = m + 


(X • V)/(Q) 
11 


(X-V)-'/(0) , (X • V)7(rX) 


+ ••• + 


(«- 1 )! 


+ 


We wish to estimate the remainder. 

Theorem 3. Lei a be a number > 0 and such that the closed ball of 
radius a around 0 is contained in the open set of definition of f. There 
exists a number C such that for all X with ||X|| ^ a, we have 

|(x-v)7(tX)1 ^ elixir. 



76 


Taylor’s formula 


A 4 k A U A- Vy U AiA [VI, §4] 

Proof. For simplicity we limit ourselves to the case of two variables. 
We take A' = (j, y). Then 

A' ■ V = xDi -h yD2. 

If wc take its 5-th power, then 

{xDi + yOzY = x’D\ 

Tlie terms in between come from the binomial expansion, and are of type 

Cijx'ifD\D2 

witii i -f- j = s, and suitable binomial coefficients c.y = . 

All the repeated partial derivatives 

DlDij 

are continuous. \\e shall take for granted without proof that there is a 

number C' .such that these partials arc bounded by C' in the disc of radius 
0 . In other words, 

woijm ^ a 

wlienever "Q 5 a and i + j = s. 

We know that 

;j-| g ||A|| and \y\ % ||X|! 
because jA'il = f 

[xVl 5 |'A'i[‘i!A!r' = tiAl‘. 

I he mimbers c,j are fi.ved (binomial coefficients) and bounded by a con- 
■^tani We want to e.sfimatc the absolute value of the sum of the terms 

Cnx'y^D\D^2j{Q) 

for Q . t\ (and iience £ a). According to the estimates we have 
made above each such term is bounded in absolute value by 

C”!!A'ii'l|A'|[^’' = C'’iiA'j|’C". 

1 lie .sum consists of a (inite number of terms. Hence we get 

K-v-v)y{())l ^ ciiA’ i* 

(with the constant T e,,ua] toC’C" times the number of terms in the sum), 
tins proves what we wanted. 

When .\ approaches 0. then ||A'[!’ approaches 0 much more rapidly 
if S' ^ 2. Tims the more terms we can take in the Taylor formula, the 
I'f tier Jt|)proxitnation to the function do we get. 



(VI, 54] 


ESTIMATE FOR THE REMAINDER 


77 


Exercises 

1. Let / be a function as in the theorem concerning Taylor’s formula, with 
P = 0. Let a be a number > 0. Let G(X) be a polynomial in xi, .. ., x„. of 
degree ^8—1, such that 

\fiX) - C?(X)1 i ClIXll' 

for all X satisfying ^ a. Show that there is only one polynomial (?(A’) 
satisfying this condition, and that G{X) consists of the terms of order ^8—1 
in Taylor’s formula. 

2. Lct/(xi, X2) be a function with continuous partial derivatives of order ^ 2. 
Assume that/(O, 0) = 0. Show that there exist functions ji, with continuous 
partials of order ^ 1 such that 

/(Xi,X2) = Xljlfxi, X2) + X2ff2(Xl, J^2)- 

[Hini: Use the fact that 

/m = I ^ ntx) dLi 

3. In the preceding exercise, assume that DJiO) = 0 for i 1,2. Show 
that there exist continuous functions hijiX) such that 

fiX) = Y,XiXjh„iX), 

the sum being taken for t, j == I and 2, and A,; *= h,i. 

4. Let / be a differentiable function defined on all of n«space. Assume that 
/(O) = 0, and that f{tX) = tf{X) for all numbers t and vectors X. Show that 
for all vectors X, we have 


f{X) = grad/(O)-A. 


5. I^t / be a function with continuous partial derivatives of order ^ 3. 
Assume that/(O) = 0 and also that/(< A) = for ail numbers t and vectors 

A'. Show that for all vectors X we have 


/(A) 


(X ■ 7)V(0) 
2 ! 


0. Let U be an open set having the following property: Given two points X, Y 
in If, the line segment joining X and Y is contained in the open set. 

(a) What is the parametric equation for this line segment? 

(b) Let / have continuous partial derivatives in U. Assume that 
llgrad/ (P)l| S M for some number M, and all points P in U. Show 
that for any two iH>ints X, Y in U, we have 


1/(A) -fiY)\ ^ M\\X - Y\\. 



78 


Taylor’s formula 


[VI. §41 


7. Let f be differentiable on the open set U. Let X be a point of U, let H be 
a vector such that the line segment between X and X + ^ is contained in U. 
Using the fact that 

nX + H)- f(X) = /^ I /(X- + IH) dt, 

show that 

|/(X+ff) -/(X)| g MOT, 

if M is a number such that Hgrad / (P)|l ^ M for all P on the above-mentioned 
line segment. 



CHAPTER VII 


Maximum and Minimum 

When we studied functions of one variable, we found maxima and 
minima by first finding critical points, i.e. points where the derivative is 
equal to 0, and then determining by inspection which of these are maxima 
or minima. We can carry out a similar investigation for functions of 
several variables. The condition that the derivative is equal to 0 must 
be replaced by the vanishing of all partial derivatives. 

§ 2 . Critical points 

Let / be a differentiable function defined on an open set U. Let P be 
a point of U. If all partial derivatives of / are equal to 0 at P, then we 
say that P is a critical point of the function. In other words, for P to be a 
critical point, we must have 

DJ(P) = 0...., Z)n2(P) = 0. 

Example. Find the critical points of the function f{x, y) = 

Taking the partials, we see that 

^ -and ^ - . 

dx dy 

The only value of (i, y) for which both these quantities arc equal to 0 is 
X = 0 and y — 0. Hence the only critical point is (0, 0). 

A critical point of a function of one variable is a point where the deriva¬ 
tive is equal to 0. We have seen examples where such a point need not be 
a local maximum or a local minimum, for instance as in the following 
picture: 



A fortiori, a similar thing may occur for functions of several variables. 
However, once we have found critical points, it is usually not too difficult 
to tell by inspection whether they are of this type or not. 

79 



80 


MAXIMUM AND MINIMUM 


IVII, §1) 


Let / be any function (differentiable or not), defined on an open set U. 
We shall say that a point P of U is a local maximum for the function if 
there exists an open ball (of positive radius) B, centered at P, such that 
for all points X of B, we have 

f{X) ^ m. 

As an exercise, define local minimum in an analogous manner. 

In the case of functions of one variable, we took an open interval in¬ 
stead of an open ball around the point P. Thus our notion of local maxi¬ 
mum in n-space is the natural generalization of the notion in I-space. 

Theorem 1. Lei j be a function which is defined and differentiable on 
an open set U. Let P be a local maximum for f in Then P is a critical 
point of f. 

Proof. The proof is exactly the same as for functions of one variable. 
In fact, we shall prove that the directional derivative of / at P in any 
direction is 0. Lot // be a non-zero vector. For small values of t, P -f- IH 
lies in the open set and f{P tli) is defined. Furthermore, for small 
values of /. tH is small, and hence P tH lies in our open ball such that 

f{P + ///) ^ f{P). 

Hence the function of one variable ^(0 = f{P + tH) has a local maximum 
at < = 0. Hence its derivative g'(0) is equal to 0. By the chain rule, we 
obtain as usual: 

grad / (P) • // = 0. 

This e(juation is true for every non-zero vector H, and hence 

grad/{P) = 0. 

This proves what we wanted. 


Exercises 

Find tin* critical points of the following functions. 

1. j* -r ■^ly — — 8z — 6 j/ 2. j + j/ sin x 

;j, j- 4 y- -i- 2- 4. {x + y)€-^^ 

ft. xy -f- xz G. cos + .2“) 

7. x-ir 8. T r 

‘J. {x — y)* 10. X sin y 


•> 


11. x- 


- X 


12. c 




14. In the i^receding exercises, find (he minimum value of the given function, 
and give all points where the value of the function is equal to this minimum. 



(VII, §2) 


THE QUADRATIC FORM 


81 


§ 2 . The quadratic form 

Let / be a differentiable function on an open set U, and assume that all 
partial derivatives up to order 3 exist and are continuous. Let P be a 
point of U. 

According to Taylor’s formula, wc have 

f^P + H) = m + («• v)/(P) + • ^)'‘m + 

where Rz, the remainder term, satisfies the estimate 

|/?3l ^ CII//1I®. 

provided \\H\\ ^ a for some number a > 0. 

If grad / (P) = 0, then 

{H - V)/(P) = 0. 

In that case, the next best approximation to the function / near P is given 
by the term 

which we shall call the quadratic term (or term of order 2) in Taylor’s 
formula. 

Let /C be a fixed vector 7 ^ 0 and let H = iK with small values of t. 
Then 

(iK-vfm = t^K-vyrn 

and 

\\iir = f\ 


If (/C’V)V(P) 9 ^ 0, then the quadratic term is much larger than the 
error term P3. Hence the quadratic term describes the value of the 
function to within the approximation given by a cubic term. 

In practice, the quadratic term is the most important. Wc write it out 
in full in the case of two variables: 




(P) + 2hk 


dx dy 


(P) + 




The function of x, y given by 

+ 2njD,D2m + 

is called the quadratic form associated to / at P (whenever grad / (P) = 0). 

Example. Let f{x, y) = Then you will verify immediately 

tliat 


grad/(0,0) = 0 . 



82 


MAXIMUM AND MINIMUM 


[VII, §3] 


Taking P = (0, 0) to be the origin, we see that the quadratic form associ¬ 
ated to / at is 

-(x2 + y-^). 

Indeed, an easy computation shows that D"\f{0) = —2, DxD2f{0) = 0 
and D 2 S{ 0 ) = —2. Substituting these values in the general formula gives 
the desired expression —{x^ -f y^). 

Exekcise 

1. Find the quadratic form associated to the function / at the critical 
|)oints P in the Exercises of §1. 


§ 5 . Boundary points 


In considering intervals, we had to distinguish between closed and open 
intervals. We must do an analogous distinction when considering sets of 
points in space. 

I/^t S be a .set of points, in some n-space. Let P be a point of 5. We 
shall say that P is an interior point of S if there exists an open ball li of 
positive radius, centered at P, and such that B is contained in S. The 


next picture illustrates an interior point (for the set consisting of the region 



We liave also drawn an open ball around P. 

' definition, we conclude that the set consisting of all 
interior points of N is an open .set. 

iioint P (not necessarily in .S) is called a boundary point of .S' if (‘very 
open hall li centered at P includes a point of .S, and also a point which is 
not in N. We illustrate a boundary point in the following picture: 



1 ‘or example, the set of boundary points of the closed ball of radius 
(I > 0 is tlu' sphi*re of radius a. In 2-space, the plane, the region con- 



[VII. §3) 


BOUNDARY POINTS 


83 


sisting of all points with y > 0 is open. Its boundary points are the points 
lying on the x-axis. 

If a set contains all of its boundary points, then we shall say that the 
set is closed. 

Finally, a set is said to be bounded if there exists a number 6 > 0 such 
that, for every point X of the set, we have 

\\X\\ ^ b. 

We are now in a position to state the existence of maxima and minima 
for continuous functions. 

Theorem 2. Let S be a closed and bounded set. Let f be a continuous 
function defined on S. Then f has a maximum and a minimum in S. 
In other words, there exists a point P in iS such that 

m ^ f(x) 

for all X in S, and there exists a point Q in S such that 

m ^ fiX) 

for all X in S. 

We shall not prove this theorem. It depends on an analysis which is 
beyond the level of this course. 

When trying to find a maximum (say) for a function /, one should first 
determine the critical points of / in the interior of the region under con¬ 
sideration. If a maximum lies in the interior, it must be among thc.se 
critical points. 

Next, one should investigate the function on the boundary of the region. 
By parametrizing the boundary, one fre<iuently reduces the problem of 
finding a maximum on the boundary to a lower-dimensional problem, to 
which the technique of critical points can also be applied. 

Finally, one has to compare the pos.sible maximum of / on the boundary 
and in the interior to determine which points are maximum points. 

Example. In the Example in §1, we observe that the function/(x, y) = 
+v ) bQCQineg very small as x or y becomes large. Consider some big 
closed disc centered at the origin. We know by Theorem 2 that the 
function has a maximum in this disc. Since the value of the function is 
small on the boundary, it follows that this maximum must be an interior 
point, and hence that the maximum is a critical point. But we found in the 
Example in §1 that the only critical point is at the origin. Hence we con¬ 
clude that the origin is the only maximum of the function /(x, y). The 
value of / at the origin is/fO, 0) = 1. Furthermore, the function has no 
minimum, because f(x, y) is always positive and approaches 0 as x and y 
become large. 



84 


MAXIMUM AND MINIMUM 


[VII, §4] 


Exercises 

Find the maximum and minimum points of the following functions in the 
indicated region. 

1. z + 1 / in the square with corners at (d=l, ±1). 

2. z + y + z in the region z^ + + z^ ^ 1. 

3. zy — (1 — z^ — y^)*'^ in the region z- + y^ ^ 1. 

4. 144z^y^(I — z — y) in the region z ^ 0 and y ^ 0 (the first quadrant 

together with its boundary). 

5. (z^ + 2y^)e“t'^'''‘'*' in the plane. 

6. (z^ + y^)”* in the region (z — 2)^ + y^ ^ 1. 

7. Which of the following functions have a maximum and which .have a 
minimum in the whole plane? 

(a) (z + 2y)e-'*-*'* (b) e'"* 

(c) (d) 

(e) (3z^ -}- 2y^)e“‘^'“(0 -zV'+''“’ 

\ 0 if (z, y) = (0. 0) 

8. What is the point on the curve (cos f, sin f, sin (1/2)) farthest from the 
origin? 

^4. Lagrange multipliers 

In this section, we shall investigate another method for finding the 
maximum or minimum of a function on some set of points. This method is 
particularly well adapted to the case when the .set of points is described 
by moans of an equation. 

We .sliall work in .3-space. Let y be a differentiable function of three 
variables z, y, z. We consider the surface 


S(.V) = 0. 

Let (■ be an open set containing this surface, and let / be a dilTerentiable 
function defined for all points of i’. We wish to find those points P on 
the surface y(A') = 0 such that f{P) is a maximum or a minimum on the 
surface. In other word.s, we wish to find all points P such that g{P) = 0, 
and either 

f(P) ^ /(-V) for all A' such that y(A') - 0, 
or 

f(P) £ f{X) for all A' svich that y(A') = 0. 

.\ny such point will be called an extremum fur f subject to the constraint g. 



IVII, §4) 


LAGRANGE MULTIPLIERS 


85 


In what follows, we consider only points P such that g{P) = 0 but 
grad g {P) 0. 

We shall now show that for any extremum point P for / subject to the 
constraint g, there exists a number X such that 

grad/(P) = Xgrad^{P). 

Indeed, let X{t) be a (differentiable) curve on the surface passing through 
P, say X{lo) — P. Then the function /(-Y(0) has a maximum or a mini¬ 
mum at to- Its derivative 

ifixv)) 

is therefore equal to 0 at to- this derivative is equal to 

grad/ff') • A''{/o) = 0. 

Hence grad/(P) is perpendicular to every curve on the surface passing 
through P. It can be shown that under these circumstances, and the 
hypothesis that grad g (P) 5*^ 0, there exists a number X such tliat 

(1) grad/(P) = Xgrad^fP), 

or in other words, grad/(P) has the same, or opposite direction, as 
grad g (P), provided it is not 0. Intuitively, this is rather clear, since Die 
direction of grad g (P) is the direction perpendicular to the surface, and 
we have seen that grad / (P) is also perpendicular to the surface. To give 
a complete proof would rcijuire technical arguments in linear algebra, 
which we shall omit. 

Conversely, when we want to fiiid an extremum point for / .subject to 
the constraint g, we find all points P such that g{P) = 0, and such that 
relation (1) is satisfied. We can then find our exlreinuin points among 
these by inspection. 

(Note that this procedure is analogous to the procedure used to find 
maxima or minima for functions of one variable. We first determined all 
points at which the derivative is e(|ual to 0, and then determined maxima 
or minima by inspection.) 

Example. Find the extrema for the function + i/^ -f 2 ^ subject to 
the constraint x'^ 4- 2 y^ — 2 ^ — 1 = 0. 

Computing the partial derivatives of the functions/and g, we find tiiat 
wc must solve the .system of etjuations 

(b) 2y ~ \ - Ay, 

(d) g{X) = T 2y- - 2 ^ - 


(a) 22- = X • 2x, 

(c) 2 z=\-{-2z), 


I - 0. 



86 


MAXIMUM AND MINIMUM 


IVII, §4] 


Let (xo, I/O, ^o) be a solution. If Zq 0, then from (c) we conclude 
tliat X = — 1. The only way to .solve (a) and (b) with X = —1 is that 
X = T/ = 0. In that case, from (d), we would get 

20 - -1, 

which is impos.siblc. Hence any solution must have Zq = 0. 

If Xo ^ 0, then from (a) we conclude that X = 1. From (b) and (c) 
we tlien conclude that i/o = 2o = 0. From (d), we must have xq = ±1. 
In this manner, we liave obtained two solutions satisfying our conditions, 
namely 

(1,0,0) and (-1,0,0). 

Similarly, if i/o 0, we find two more .solutions, namely 

(0,va,0) and (0. -ViO). 

These four points are therefore the extrema of the function / subject to 
the constraint g. 

If we ask for the minimum of/, tlien a direct computation shows that 
the la.<t two points 

(0. iv^iO) 

are the only possible solutions (because 1 > i). 


Exeuci.ses 

1. Find tin' minimum of the function x + y- subject to the constraint 

2x-+ = 1. 

2. Find the maximum value of x" xy + y- -j- yx + ;• on the sphere of 
radius 1. 

3. I.et .1 = (1, 1, — 1), /i = (2, I, 3), C = (2, 0, —1). Find the point at 
winch the function 

/(.V) = (.V - . 1 )" -b (.V - B )2 + (.V - C)^ 
reiiches its minimum, and find the minimum value. 

4. Do Exercise 3 in genorul, for any three distinct vectors 

A = o; 0 , = (6i.& 2. 63), C = (Ci,f2. C3). 

5. Find the maximum of the function 3x- + 2\^2 xy + 4y“ on the circle of 
radius 3 in the plane. 

6. Find the maximum of the functions xyz subject to the constraints x ^ 0, 
y g 0, 3 ^ 0 and xy -r y- -r xr = 2. 



(VII, §4) 


LAGRANGE MULTIPLIERS 


87 


7. Find the maximum and minimum distance from pmints on the curve 

f i>x>j -h oij~ = 0 

and the orij'in in the ))lane. 

8. Find the extreme values of the function cos^ t -{- cos-y subject to tlu‘ 
constraint x — y = tt, 4 and 0 5 z $ tt. 

9. Find the points on the surface z- — jy = 1 nearest to the origin. 

10. Find the extreme values of the function xy subject to the coiirliiion 
X+ y = 1. 

11. Find the shortest distance from the point (1.0) and tlu* curve y- = Ax. 

12. Let X, y be two positive numbers. Show that 




13. Let r, y, z be three positive numbers. Show that 




■r + .V -f 2 

3 


14. Generalize Exorcise 13 to n numbers. 



CHAPTER VIII 


Multiple Integrals 

When studying functions of one variable, it was possible to give essen¬ 
tially complete proofs for the existence of an integral of a continuous 
function over an interval. The investigation of the integral involved lower 
sums and upper sums. 

In order to develop a theory of integration for functions of several 
variables, it becomes necessary to have techniques whose degree of sophis¬ 
tication is somewhat greater than that which is available to us. Hence we 
shall only state results, and omit proofs. These results will allow us to 
compute multiple integrals. 

We shall also list various formulas giving double and triple integrals in 
terms of polar coordinates, and we give a geometric argument to make 
them plausible. Here again, the general formula for changing variables in 
a multiple integral can be handled theoretically (and elegantly) only when 
much more machinery is available tlian we liavc at present. These topics 
properly belong to an advanced calculus course, and no good purpose 
would be achieved by giving liere a half-l)aked treatment. 

S/. Double integrals 

We l)egin by discu.'jsing tlie analogue of upper and lower sums associated 
with partitions. 

Let li lie a region of the plane, and let / be a function defined on R. 
We sliall say tliat f i.s bounded if there exists a number M such that 
i/(.V)| 5 for all A' in R. 



Let a. b he. two juimbers with a ^ h. and let c, d be two numbers with 
c 2 (1. We con.'iider the clo.sed interval (a, 6] on the a'-axis and the closed 
interval [c, t/j on the y-axis. Tliese determine a rectangle R in the plane, 
consisting of all pairs of points (j, y) with a ^ x ^ b and c ^ y ^ d. 

'Lhe rectangle R above will be denoted by [a, b] X (c, d). 

88 




IVIII. §lj 


DOUBLE INTEGRALS 


89 


Consider a partition of the interval (a, b], in other words a sequence of 
numbers: 

Xi = d ^ X2 ^ ^ Xn + l = 6 

and a partition of (c, d], namely 

!/l = C ^ f/2 ^ ^ yn + i = d. 

Each pair of small intervals [x,, x,+i) and (j/y, J/;+il determines a small 
rectangle as indicated on the next figure. We shall call this small rectangle 
Rif. Thus we may view our partitions together as determining a partition 
of the big rectangle into small rectangles. 


Vi 



Let / be a function defined on R, and assume that / is bounded. For 
each pair i, j let Mij be the least upper bound of the values of / on 
in other words, Af,-; is the lca.st upper bound of all numbers/(X), where 
X ranges over points in Ri,. This least upper bound exists because we 
assumed / bounded, and wc can apply a standard property of the real 
numbers. 

We take the sum of all terms 

— '/>)(x,+i — Xi) 

with 1 5 t ^ n and I ^ g m, and write this sum with the usual 
notation, namely; 

— '/>)(x.-+i - X,). 

This sum will be called an upper sum associated with the partitions and 
the function /. 

Similarly, if denotes the greatest lower bound of the values of / on 
the small rectangle /?,y, then we can form a lower sum 

— !/j)Cx, + i — Xi). 

If is any point in the rectangle then 

g f{Pij) ^ Mij, 






90 


MULTIPLE INTEGRALS 


[VIII, §1] 


and hence the sum 


jyiPi})iy}+i — — a ) 


lies between the lower and upper sum. 

We shall state a theorem which gives us conditions under wliicii there 
exists a unifiuc number whicli is greater than or equal to every lower sum, 
and less than or ecjual to every upper sum. If such a number exi.sts, wo 
say that / is intcrjrahlc on li, and wc call this number its integral over li. 
We denote it. by 

Iff or i/) dydjr. 

n It 

I'lider Miitable conditions, we can interpret the integral as a volume. 
Indeed, suppose that / is continuous, and that /(.r, y) ^ 0 for all (.r, y) in 
the reetangle. The value f{x,y) at a point (.r, y) may be viewed as a 
height above th(‘ point (.r, y), and we may consider the integral of / a.s 
the volume of the ^l-dinumsional r(‘gion lying above the rectangle and 
bounded from above by the values of /. 

We can tilso interpret iS as a metal sheet, and a function / as giving a 
deiiviiy ilisiribution on .S, Then the integral is interpreted as the mass 
of the >heet. 

To state the next theorem, we need some terminology. 

Lei ,v. I b«- numb<-rs with s ^ t. Let / be a function defined on the 
closed interval (.v, / j. If / is ilifferentiable, and if its diuivative is continuous, 
we >hall say that / is .snumlh. I^^t /, y be two functifins ihdined oti [s, t\. 
If both /and y an- smooth, then tlie set of poitits (/(.r), y(x)) as .r ranges 
o\t-r the interval will l)e called a smitoth curie. (In preceding chapters we 
hail I'on.'iih'red curves arising from open intervals, but for this chapter, 
w*‘ rlainge oui' meaning and deal only with closed intervals. If the interval 
consists of a single point, we define any function of that point to be dilTer- 
entiable, and we agree to say that its derivative is 0 . If s < t, then at 
the enil points, the derivative is meant to be the right or left derivative 
respi'ct i\-ely.) 

I,ct .S l»c a region in the plane. We .<ay as usual that N is bounded if 
t hiTc N a number .1/ such l hat X g .1/ for all points X in S. The set 
of buiiiidarv points of ,s’ will also be callc'ii the boundary of S. We sliall 
siy tliai the boundary of .S is smooth if it. consists of a linite number of 
'iiii'oi h curves, 

-s be a region in tlie plaii(\ and let / be a function defined on S. 
A- u-uai. we say that / is ronlinuous at a point P of N if 

lim /(.V) = /(/'}. 

.V -r 

We ;.y that ,f is continuous on .S’ if it is continuous at ever}' point of S. 



(VIII, §1) 


DOUBLE INTECnALS 


91 


Theorem 1. Let It be a rectangle as above, aiul let f be a function 
defined on It, bounded, and continuous except possibly at the points lying 
on a finite number of smooth curves. Then f is integrabk on R. 

To compute the integral we shall investigate double integrals. 

Lot / be a function defined on our rectangle. For each x in the interval 
(a, 6) we have a function <fo[y given by ^{y) — f{x, y), and this functioii 
is defined on the interval (c, dl. A.ssume that for each x this function s? 
is integrablc over this interval (in the old .sense of the word, for functions 
of one variable). We may then form the integral 

fd fd 

j -fiy) <ly - j ./’(r, y) dy. 


The expression we obtain depends on the particular value of x cho.<en in 
the interval (a, 6), and is thus a function of x. Assume that this function 
is integrable over the interval [u. 6). We can then take the integral 


/ [/ ’ also written 



which is called the repeated intvyral of /. 

Example 1. Let/(x, y) — .v^y. Kind the repeated integral of /over the 
rectangle determined by the intervals (1,21 on the .r-axis and \ ~-i, -11 on 
the //-axis. 

We find the double integral 


2 ,4 


h / dydx. 


To do this, we first compute the integral with re.spect to y, namely 




For a fixed value of x, we can take x" out of the integral, and hence this 
inner integral is ecjual to 



We then integrate with respect to x, namely 



Thus the integral of /over the rectangle i.s equal to 



92 


MULTIPLE INTEGRALS 


[VIII, §1J 


We shall now extend the definition of the integral to more general 
regions than rectangles. Let 5 be a bounded region, and assume that the 
boundary of S consists of a finite number of smooth curves. Let / be a 
function defined on S, which is bounded and continuous except at a 
finite number of smooth curves. We can always find intervals [a, b] and 
[c, d] such that S is contained in the rectangle 

R = [a, 6] X [c, d], 

because S is bounded. We define / on ^ to be equal to 0 at points lying 
outside S. It can be shown that the integral 

/A 

H 

doe.s not depend on the choice of rectangle R containing S, and we define 
the integral 

Iff 

s 

to be the integral of / over R. 

We shall now make a list of tlie properties of the integral. In order to 
avoid repetitions, we shall assume throughout that any region we speak 
of is bounded, has a boundary consisting of a finite number of smooth 
curv’e.s, and that any function defined on a region is bounded, and con¬ 
tinuous except on a finite number of smooth curves. 

Property 1. U f, g are two functions defined on a region S, then 

ffU + 9)= jjf + jjg. 

s s s 

If c is a number, then 

//./ = c//, 

s s 

Property 2. If /, g are two functions defined on a region *S, and 
/(V) ^ y(A') for all points A' in S, then 

IP ^ IP 

s s 

Property It. If .S’ can be expressed a.s a union of two regions Si, S 2 having 
no point in common except possibly boundary points, and / is a function 
defined on .S, then 



[VIII. §11 


DOUBLE INTEGRALS 


93 


?2(l) 

The following situation will arise fre¬ 
quently in practice. 

Let Ox, Qz be two smooth functions on 
a closed interval [a, 6] (a ^ 6) such that 
^i(z) ^ g 2 (^) for all X in that interval. 

Let c, d be numbers such that 

c < ffi(j-) ^ !?2(-r) < d 

for all X in the interval [a, 6]. Then ^i, 
g 2 determine a region S lying between x = a, x = b, and the two curves 
?i. 52- 

Let / be a function which is continuous on the region S, and define / 
on the rectangle [a, 6} X [c, d} to be equal to 0 at any point of the rec¬ 
tangle not lying in the region *S'. For any value x in the interval [a, 6] 
the integral 

d 

/(•r, y) dy 

9 

can be written as a sum: 




y) dy + y) dy + /{.r, y) dy. 

Since/(x.f/) = 0 whenever c % y < </j(x) and 32 (-r) < !/ ^ d, it follows 
that the two extreme integrals are equal to 0. Thus the repeated integral 
of / over the rectangle is in fact equal to the repeated integral 


Uegions of the type described by two functions qx, g-^ as above are tlie 
most common type of regions with wliich we deal. 

The repeated integral is useful to compute a double integral because of 
the following theorem. 

Theorem 2. Let gx, gz be two smooth functions defined on a closed 
inlerial (a, 6] (a ^ b) such that < 7 i(x) g g-^U) fur all x in that interval. 
Let f be a continuous function on the region S lying between x = a, x ~ b, 
arul the two curves gi(x) and g 2 ix). Then 



in other words, the double integral is equal to the repeated integral. 



94 


MULTIPLE INTEGRALS 


IVIII. SI] 


Since we know how to evaluate the integral of a function of one variable 
in many cases, the preceding theorem makes it possible to compute 
double integrals in terms of integrals of functions of one variable. 

Given a region S it is fre{juentiy possible to break it up into smaller 
regions having only boundary points in common, and such that each 
smaller region is of the type we have just described. In that case, to com¬ 
pute the integral of a function over *S', we can apply Property 3. 


Example 2. I.et /(.r, y) = 2xy. Find the integral of / over the triangle 
bounded by the lines y = 0, y = x. and the line .r -i- y = 2. 

The region is a.s shown at the right. 

We break up our region into the portion 
from 0 to 1 and the portion from 1 to 2. 

These corre.^pond to the .small triangles .Si, 

^2 indicated in the picture. Then 


/F=/„'[/; 




and 





$2 

'riierc is no didiculty in evaluating these integrals, and we leave them to 
you. 


Rrmnrl;. In the statement of Theorem 2, one might a.sk why we re- 
fpiired / to he contitiuous in tlie wln)le region N. Tlio reason is that if we 
allow discontinuities on smooth curves, then there will be values of x such 
that the integral 


/(-Tu. y) (ly 

is not delined. As an example, con¬ 
sider a rectangle, and a function / 
which is equal to 1 at every point ex¬ 
cept on one vertical line. 




ft 


^0 


If one ilehnc.s / in a sutlieiently horrible way cm tliis line, then 



is not defined, i.c. the functiem v-(y) = /(xq, y) is not intcgrable. 




[VIII, §2] 


POLAR COORDINATES 


95 


Exercises 


1. Find the value of the following re|)eated integrals. 

2 ,-3 ''» 


(ii) (x + y) dx dy 

(c) ( I \Gdz dy 
JQ Ju^ 


(b) yj ydydx 


(d) ( [ J sin 1 / rfy dx 
Jo Jo 


n y' 

dxdy 


ydy dx 


2. Find the integral of the following function.^. 

(a) X cos (x + {/) over the triangle who.se vertices are (0, 0), (jt, 0), and 

(tt.tt ). 

(h) over the region defiiMui by |x| + ]i/I g 1. 

(c) x“ — y^ over the region hounded by the curve y = sin x between 
0 and TT. 

(d) x^ + y over the triangle whose vertices are ( - J. i), (I, 2). (1. — 1). 

3. Find the numerical answer in Example 2. 

‘1. (a) Let a be a number > 0. Show that the area of the ri’gion consisting of 
all i)oints (x, y) such that |x| + Iy| ^ a. i.s (2u)‘/2!. 

(b) Aft»‘r you liave read the beginning of the next section, .show that the 
volume c»f the 3-(lim»'n>ional region eonsi>ting of all points (xi, x^, xa) 
.'‘ucli that |xi| + Ix^l + lx:j| ^ (i. is equal to (2a)-V3!. 

(c) Generalize* to n-space. showing that the volume of the analogtnis 
n-dimensional region is (2o)'7n!. 


Polar coordinates 

It is freijuently more convenient to describe u region by int'ans of polar 
coordinates than with the "reclatigular” coordinates used in tlie preceding 
section. 

I>«'t us consider two numbers a, b with a ^ b and an interval 


a ^ 0 ^ b. 

We shall also assume that b ^ <i -1 27r. I.x‘t 
c.d be two numbers with 0 ^ c ^ d. 'i’hen 
the set t)f points S whose polar coordinates 
{0, r) satisfy n ^ 0 ^ b and c ^ r ^ <I form 
a regiotj as shown in the figure at the right. 
Consider partitions 

a = 0i ^ O 2 ^ ‘ ^ 0„ = b, 
c = r, g ^ ^ r„, = d 




96 


MULTIPLE INTEGRALS 


(viir, § 2 ] 


of the two intervals [a, 6 ] and (c, dj. Each pair of intervals [6 i, ^.+ 1 ] and 
[rj, ry+il determines a small region as shown on the following figure. 





The area of such a region is equal to the difference between the area of 
the sector having angle 0 , 4 ., — 0 , and radius tj+i, and the area of the 
sector having the same angle but radius ry. The area of a sector having 
angle d and radius r i.s equal to 


d 

27r 




Consequently the difference mentioned above is equal to 

(^.+1 — ^1) 0+1 (^i+i — 8 i ) r ] _ rt \ (0+1 + o) / 

9 T — -o-(O+l 


- Tj). 


We note that 


_ 0+1 + Tj ^ 

Tj s- - ^ ry+i. 


Let/be a continuous function of {$, r) on the region S, and let if{d, r) = 
f{d, r)r. Then 

m n 

^>) 0 ( 0 +i - - Bi) 

>=l i = I 

is a Riemann sum for the function v> on the product (a, 6 ) X [c, dj. The 
Rinction/( 0 , r) can be viewed as a function of (j, y), since 6, r arc functions 
of X and ij. Thus there is a function f*{x, y) such that, when we put 
= r cus 0 and y = r sin 6, we have /*(x, y) = f(e, r). Then our Rie¬ 
mann sum above makes the following assertion very plausible. 

THEt)iti:M 3. Let fbe a /««c/ion of (6, r) which is defined on the region S 
discussed aborc, and is bounded, continuous except at a finite number of 
smooth curi'es. Let f* be the corresponding function of (x, y). Then 

r)rdrdB = y)dijdx. 

s s 



(VIII, §2] 


POLAR COORDINATES 


97 


As with rectangular coordinates, we can deal with more general regions S. 
Let ^ 1 , 92 he two smooth functions defined on the interval [a, b] and assume 

0 ^ 9 ,( 0 ) ^ 92 ( 6 ) 

for all $ in that interval. Let S be the region consisting of all points (0, r) 
such that a ^ d ^ b and 9 ,( 0 ) ^ r ^ 92 ( 6 )- We can select two num¬ 
bers c, d ^ 0 such that 

c ^ 9 i (6) S 02(6) £ d 


for all 0 in the interval (c, d|. Let / be continuous on S, and extend / to 
the circular sector of radius U between 6 = a and d = b by giving it the 
value 0 outside 5. Then the integral of Theorem 3 taken over this sector is 
equal to the repeated integral 

/ f{B,r)rdrdd. 

The following picture shows a typical region iS under consideration. 
The important thing to remember about the formula of Tlu-orem 3 is the 
appearance of an extra r inside the integral. 



We al-so remark that a region could be described by taking 0 as a func¬ 
tion of r, and letting r vary between two constant values. In view of 
Theorem 2, we can evaluate tlie double integral of Tlieorem 3 by repeated 
integration first witli respect to 0 and tlien with respect to r. 

In dealing with polar coordinates, it is useful to remember tlie eciuation 
of a circle. Let a > 0. Then 


T = a cos 0 , 




is the equation of a circle of radius a/2 and center (a/2, 0). Similarly, 


r = a sin 0 , 


0 5 0 ^ TT, 



98 


MULTIPLE INTEGRALS 


[VIII, §31 


is the c(iuation of a circle of radius a/2 and center (0, a/2). You can easily 
sliow this, as an exercise, using the relations 

r = \'.r- — X = r cos 8, y = r sin 6. 

{Xotc. The coordinates of the center above are given in rectangular 
coordinates.) 


Kxehcises 

1. Hy chunninji to polar loonlinatc.s, find the integral of over the region 

consisting of the [loints (x. y) >uch that x* -r y' ^ 1. 

2. Find tlie volume of the region lying over the di.se x“ -r (i/ — 1)" ^ 1 and 
houiuh'd from above by the function r = x" t y~. 

3. Fiml the integral of ^-<'’+ 1 '*’ over the circular disc bounded bv 


*1 •> 
X- -r y- = O'. 


n > 0 


•1. What i 




c dx dy? 


'). I'ind the mass of a square plate of side a if the ilensity is proportional to 
the >rpjare of the distance from a vertex. 

fi. Find the ma.'S of a eircular disk of radius a if the density is proportional 
tr) the sipiare of the ilistance from a point on the circumference. [Iliril: Center 
the fli'k ul the origin, and lei the point he (a. 0) in re«-iangular coordinates.! 

7, l-'iiul the mass of a plate boutuled by one arch of the curve y => sin x, and 
the T-;i\is. if the ilensity is [iroportionul to the distance from the x-axis. 

?i.?. Triple integraLs 

File diseu.-'sioi) given in §1 and §2 coneertiing double integrals can be 
npplii'd to triple integrals, or nmlfiple integrals of any number of variables. 

We .shall consider only triple integrals, since our main purpose is not to 
gi\(‘ a general tlieoretieal treatment of the .subject, but some concrete, 
rather (-(uuputational comments. 

liistcati of coiisiilming rectangles, we consider rectangular boxes deter- 
nniiod by the proiluet of three intervals. Kverythiiig we said concerning 
llienianii sums would apply. It must be pointed out, however, that the 
bdiimlaiies ot our d-dimensional region.^ cannot consist only of curves. 
riie\' have to be parametrized surfaces. Tlii'se can be defined as follows, 
l.i't u, l>. <\ d be numbers, a g h and c 5 d. We consider the rectangle 
d< lined by a g f ^ and c g a ^ d. la't /i(f. a), / 2 (f, a). /aC. “) be 
tliiee liuietions thdined on the rectangle, and having continuous partial 
ilei ivat i\ es. We call these smooth. The set of poitits (.r, y, z) in li-space 



(VIII, §3) 


TRIPLE INTEGRALS 


99 


consisting of all points 

(/l(^ w),/2(^ «)) 

as {t, u) ranges over the square is called a smooth piece of surface. In 
3-dimensions, we consider regions »S whose boundary consists of a finite 
number of smooth pieces of surface. These arc the analogues of the 
smooth curves considered in §1. 

With this modification, the three properties of §1 and Theorems 1 and 2 
are true, taking a triple integral instead of a double integral. 

We shall enumerate the formulas for triple integrals as repeated integrals 
when a region is determined by inequalities as in §! and §2. If .S’ now 
denotes a 3-dimensional region, and / a suitable function on .S, we denote 
the integral of / over S by 

jiff or jlffiJ:,y,z)(h(Iy(ix. 
s s 


If / is positive, the integral can be viewed as a 4-diniensional volume of 
a region lying above .S’ in -1-space. I'’urlherinor<‘. if .S’ is vi«‘wc‘d as a solid 
piece of material, and / is regarded as representing a density di.'.tribution 
over .S', then the ijitegral may be interpreted as the mass of .S, 


Case I. Rrctarujular coordinates. J.et a, h be numbers, a ^ h. U t 
9i, 02 he two smooth functions defined on the interval \a, h\ such that 


3i(j9 ^ O2U ). 

and let /i|(.r, y) ^ y) be two smooth functions defined on the region 
consisting of all points (/, //) such that 


(liy smooth functions of several variables, we .shall mean fimcliotis having 
continuous partial derivatives.) Ix-t .S’ be the .set of points (x, //, z) .such that 

a X ^ h, j/ifx) % y ^ y>U), and h^{x,y) g ^ 1/). 

Let f be continuous on .S’. Then 

lie 

«s 

I’or simplicity, the integral on the right will also be writtet> without the 
brackets. 

Case 2. Cylindrical coordinates. We now lake polar coordinates in the 
(j, ?/)-plane, and keep our r-coordinatc as before. Consider a regioti .S’ 



100 


MULTIPLE INTEGRALS 


mil, S3] 


consisting of all points (0, r, z) satisfying conditions: 

a ^ 6 ^ b (fc^a + 27 r) 

0 ^ 9iie) Sr ^ gzie) 

with smooth functions gi, 92 defined on the interval [a, &], and 


hi{d,r) S z S, k2{6,r) 


with smooth functions ^1, /12 defined on the 2-dimensional region bounded 
by e = a, 6 = b, and gi, 92, i-C. the region consisting of all points (^, r) 
satisfying the above inequalities. 

Let f he continuous on this region S. Then 



(Observe the factor r on the right!) 

Case 3 . Spherical coordinates. Let (j, y, z) be the ordinary coordinates 
of a point in 3 -space. We let 

p = -f- + 2^ 


and call this the spherical p in 3 -space, to distinguish it from its analogue 
in the plane, namely the polar r. 

Let El, E2, A’3 be the three ordinary unit vectors in 3 -space, giving rise 
to the coordinates (x, y, z) of a point. If X is a vector, we let tp be the 
angle between A' and E^, 0 ^ ^ tt. Then 


X • Ez = z = p cos <p. 


Using the value p^ = y^ z^, we find 


Hence 


x^ + y® = p2 — 2^ = p- sin^ <p. 


r = \/x2 -f- y2 = p sin v’ 


(because both r and sin >p are ^ 0). 

I/'t 6 be the usual polar 6 in the plane. Then 

X = p sin cos 6, 
y = psin v?sin 6, 

and wo had 


z = p cos Ip. 



IVIII. § 3 ) 


TRIPLE INTEGRALS 


101 


We call (v>, 6 , p) the spherical coordinates of a point (x, y, z). Any point 
other than the origin has unique spherical coordinates provided we take 
p > 0, 0 ^ v’ ^ and 0 ^ 5 < 27 r. 



Consider now the elementary spherical region defined by the inetiualities 

di ^ d ^ $2, {$2 ^ 0, -f 27r) 

0 ^ Pi S P ^ P 2 , 

0 ^ ^ ^ ^ ff 2 ^ TT. 

Then the volume of this box is e(|ual to 

p*sm^(p2 - pi)(f2 - ^\)i02 — ffi) 

for some number p between pi and p^, and some number ^ between 
and <fi2- 

We leave the proof of this statement as an exercise (cf. Exercise 1). 
It is then plausible that the following assertion is true. 

Ix*t a, b be numbers such that 0 < 6 — <i ^ 27 r. 

I>et gi( 9 ), g^iB) be smooth functions of 6 , defined on the interval 
a ^ 0 ^ b, and such that 

0 ^ ^ g2(9) ^ TT. 

I^-t hi, h2 be functions of two variables, defined and smooth on the 
region consisting of all points { 6 , ip) such that 

a ^ e ^ b 

9i{B) ^ ifi ^ gzi^), 

and such that 0 5 hi{ 6 , \p) ^ ^2(6, i^) for all { 6 , ip) in this region. 

Let iS' be the 3 -diinenKional region consisting of all points (B, p, p) such 
that 


a^B ^b, gi{ 9 ) ^ <p ^ ^2(6), Ai(0, sp) g p g ^2(0, ^). 





102 


MULTIPLE INTEGRALS 


tVIII, §3) 


Let fbe a Junction which is continuous on S. Then 

fff/ = [“ p)p hin <p dp de. 

JJJ Ja Jhii0,ip) 

S 

(Observe the factor sin <f on the right!) 

Example. Find the mass of a solid body 5 determined by the inequalities 
of spherical coordinates: 

arctan 2, 0 ^ p ^ >/6, 


if the density, given as a function of the spherical coordinates (0, v>, p), 
is equal to 1/p. 

To find the mass, we have to integrate the given function over the 
region. The integral is given by 


^I'f /‘•I2 ^ arctan 2 r^/6 

lllf = L iw. Jo 

s 

Performing the repeated integral, we obtain ~ - We note 

that in the present example, the limits of integration are constants, and 
hence the repeated integral is equal to a product of the integrals 



dd 


^arctan 
•/W4 


2 


sin ifi dip - 



pdp. 


Each integration can be performed separately. Of course, this does not 
hold when the limits of integration are non-constant functions. 


Exercises 

1 . Prove the asiscrtion concerning the volume of the elementary spherical 
region made in the text by proceeding as follows. 

(a) Let 01, 02. Pi. <Pi be numbers such that 

0 < 02 — 01 ^ 2ir. PI > 0, 0 < ^ If- 

Let S be the region defined by the inequalities 

01 S 0 ^ 02. 0 S p ^ Pi. 0 ^ ^ v’l- 

This region is illustrated in the figure at the left on the next page. 

The region S can be denoted by S(^i, pi). Show that the volume 
of S is 

^Pl(l — COS^l)(02 — 0l). 



(VIII. §3] 


TRIPLE INTEGRALS 


103 


z 



Do this by transforming to cylindrical coordinates, using the relations 
r = p sin ^ and 2 = p cos <p. 


Assume first that <fii ^ fl-/ 2 . In cylindrical coordinates, the region can 
be described by the inequalities 

^1 ^ 9 ^ ^2) 0 ^ r S pi sin ^1, rcot^i ^ 2 S — r^, 

Thus 6 , r vary between fixed limits, and 2 varies on limits depending on 
r. The integral is then easy to evaluate. If ?r /2 5 v’l S jt. u>e a 
symmetry argument to show that the volume of 5 (^i, pi) is still given 
by the same formula. 

(b) Let $1, 62, ifix, >fi2, Pii P2 describe the elementary s|)hericul region a.sjn 
the text. Then the volume of this elementary spherical region can be 
expressed as a difference of volumes of type occurring in (a), namely: 

[Volume of S{ip2. Pi) — Volume of S(¥>i, pa)! 

— (Volume of *S(^2i Pi) — Volume of •Sfy*!, pj)]. 


These volumes are those taken first with the larger value P2, with <p 
lying between ip\ and ^2, and secondly with the smaller value pi, again 
with ip lying between ipt and ip2. The situation is illustrated in the 
figure at the right above. You will find that the volume of the ele¬ 
mentary spherical region is then exactly 

3 3 

i(p2 — plXcOS^l — cos ^ 2 X ^2 — 0l). 



To obtain the assertion in the text, use the mean value theorem on 
the first factors in this product. 

Find the integral 


n ilA# /pCOtl A 

, /„ 



3 . Find the mass of a spherical ball of radius a > 0 if the density at any 
point is equal to a constant k times the distance of that point to the center. 






104 


MULTIPLE INTEGRALS 


IVIII. §3] 


4 . Find the mass of a spherical shell of inside radius a and outside radius b if 
the density at any point is inversely proportional to the distance from the center. 

5 . Find the integral of the function f{x, y, z) = over that portion of the 

cylinder lying between the planes 2 = 0 and 2 = 6 > 0. 

6 . Find the mass of a sphere of radius a if the density at any point is pro¬ 
portional to the distance from a fixed plane passing through a diameter. 

7 . Find the volume of the region bounded by the cylinder y = cos x, and the 
planes 

z = y, X - 0 , X = -Tf and 2 = 0 . 


8 . Find the volume of the region bounded above by the sphere 


-f -f 22 = 


and below by the surface z ^ x^ y^. 

9 . Find the volume of that portion of the sphere ^2 + y2 _|_ ^2 _ ^2^ which 
is inside the cylinder r = o sin 6 , using cylindrical coordinates. 

10 . Find the volume above the cone z^ = x- yr and inside the sphere 
p - 2 a cos ^ (spherical coordinates). (Draw a picture. What Is the center of 
the sphere? What is the equation of the cone in spherical coordinates?) 

11 . Find the volumes of the following regions, in 3 -space. 

(a) Bounded above by the plane 2=1, and below by the top half of 

= l2 _|_ y2 

(b) Bounded above and below by = x^ y^, and on the sides by 

+ 22 = I. 

(c) Bounded above by 2 = i 2 -f y 2 , below by 2 = 0 , and on the sides 
by X' y~ = 1. 

(d) Bounded above by 2 = x, and below by 2 = x 2 -j- y 2 . 


12 . Find the integral of the following functions over the indicated region, in 
3 -space. 

(a) /(x, y, z) — x2 over the tetrahedron bounded by the plane 12x + 
20 y + 152 = 60 , and the coordinate planes. 

(b) /(x, y, 2) = y over the tetrahedron as in (a). 

(c) /(x, y, z) = 7 yz over the region on the positive side of the x2-plane, 
bounded by the planes y = 0, 2 = 0, and 2 = a (for some positive 
number a), and the cylinder x2 -|- y2 = (b > 0). 



CHAPTER IX 


Vector Spaces 

This chapter and all subsequent ones (except Chapter XIII) can be 
read immediately after Chapter I. In fact, we start where we left off, 
introducing the notions of linear independence and bases, which allow 
us to speak of dimension. 

As usual, a collection of objects will be called a set. A member of the 
collection is also called an element of the set. It is useful in practice to use 
short symbols to denote certain sets. For instance we denote by R the 
set of all numbers. To say that "x is a number” or that "x is an element 
of R” amounts to the same thing. The set of n-tuples of numbers will 
be denoted by R". Thus “X is an element of R”” and “X is an n-tuple" 
mean the same thing. Instead of saying that u is an element of a set S, w’e 
shall also frequently say that u lies in S. If S and S' are two sets, and if 
every element of S' is an element of S, then we say that S' is a subset of S. 
Thus the set of rational numbers is a subset of the set of (real) numbers. 
To say that 5 is a subset of S' is to say that S is part of S'. 

§/. Definition 

We have met already several types of objects which can be added and 
multiplied by numbers. Among these are vectors (of the same dimension) 
and functions. It is now convenient to define in general a notion which 
includes these as a special case. 

A vector space F is a set of objects which can be added and multiplied 
by numbers, in such a way that the sum of two elements of V is again an 
element of V, the product of an clement of F by a number is an element of 
V, and the following properties arc .satisfied; 

VS 1 . Given elements u, v,w oj V we have 

(u + y) + uj = u (y + ui), 

VS 2 . There is an element of F, denoted by 0 , such that 

0 +u=u+0=u 

for all elements u of V. 

VS 3 . Given an element u of V, the element (— l)u is such that 

u + (-l)u = 0 . 

105 



lOG 


VECTOR SPACES 


IIX, §1] 


\'S 4 . For all elements u, v of T, we have 

u V = V •{- u. 


VS 0. If c is a number, then c(u + i’) = cw + ci’. 

\'S (). If a, b are two numbers, then (a -F 6)i’ = av + bv. 

^’S 7 . If a, b are two tnimbcrs, then (a6)i’ = a{bv). 

\’S 8. For ail elements u of V, we have 1 • u = u (1 here is the number one). 


\Vc have* \isod all these rules when dealing with vectors, or with functions 
but we wish to be more systematic from now on, and hence have made a 
list of them. Further properties which can be easily deduced from these 
are listed as exercises and will be assumed from now on. 

'I'he sum »-*-(- l)r is u.sually written u — v. We also write —f instead 
of (— 1 )r. 


We shall use 0 to denote the number zero, and 0 to denote the element 
of any vecttir space V satisfying property \‘S 2 . We also call it zero, but 
there is never any possibility of confusion. We observe that this zero 
element O is unic|uely determined by condition \’S 2 (cf. Exercise 5 ). 

It is possible to add .several elements of a vector space. Suppose we wish 
to ad<l four elements, say u, v, w, z. We first add any two of them, then a 
third, and finally a fourth. Tsing the rules VS 1 and VS 4 , we see that it 
does net matter in which (»rder we perform the additions. This is exactly 
the same .-ituatioTi as we had with vectors. For example, we have 


(fn ‘ r) - ic) r z — {u r- (r •- U’)) -f z 

= (O’ + u’) ! u) 4- 2 

= (e + U’) -r (u 4- z) 

etc. 


Tliu.' it is customary to leave out the parentheses, and write simply 

» “- !* 4- te 4- z. 

1 he same n’lnark applies to the sum of any number n of elements of 
and a fonnal proof could easily l)e given by induction. 

Let 1 bi- ;i vector sp;ice, and let IF be a subset of V. Assume that \V 
s;iti>lit‘s ilie hdlowing conditions. 

' i ■ If;. '/• ar(’ ctemeiits of IF. their sum v — w is also an element of IF. 
1III 11 ( ).< an element of IF and c a number, then cr is an element of IF. 
4 ii) d he elenuMit O of F is also an element of IF. 

Then IF it.si’lf is a vc’ctor space. Indeed, properties VS 1 through VS 8 
being satisfied for all elements of F are satisfied a ftirtiori for the elements 
"f !F. We shall call IF a subspaee of F. 



(IX, 51) 


DEFINITION 


107 


Example 1 . Let 7 = R" and let 17 be tlie set of vectors in V whose 
last coordinate is equal to 0 . Then 17 is a subspacc of V, whicli we could 
identify with 

Example 2 . Let I' be an arbitrary vector space, and let j'l,. .. , be 
elements of V. Let ^i,. .., be numbers. An expression of type 

Xit'i + • • • + XnVn 

is called a linear combinalion of «’i,.. ., y„. The set of all linear combina¬ 
tions of I'l, .. . , I’n is a subspace of V. 

Proof. Let j/i,. . . , J/n be numbers. Then 

(xiri H-h-Tnl'n) + (j/iVi H- 1 - J/nt'n) = {^1 + H-h (-I'n n' ’JnU'n- 

Thus the sum of two elements of 17 is again an element of W, i.e. a linear 
combination of I’l,. . . , t’n- Futhermore, if c is a number, then 

c{xii>i -f • . . + a-„r„) = cxii’i -!-•••+ cx„r'„ 

is a linear combination of fj. v„, and hence is an element of M'. 

Finally, 

0 ~ Oi'i 4 - * ■ • + 0 r„ 

is an element of 17 . This proves that 17 is a sub.spuce of V. 

In Kxample 2 , the subspacc 17 is called tlie subspace gencralcd by 
I’l, . . . , t’„. If 17 = V, i.e. if every element of 7 is a linear coniluiiation 
of r,, . . . , {•„, then we say that c,, . . . , generate 7 . 

Example '.i. Ix!t 7 be the set of all functions dehned for all number.s. 
If /, g are two functions, then wc know how to form their .«nm / T g. It 
is the function whose value at a number t is/{/) 4- g{l). We also know how 
to multiply / by a number c. If is the function cf whose vahu" at a number l 
is cJH). In dealing with functions, we have used properties VS 1 through 
VS 8 many times. We now realize that the .scd of functions is a vector 
space. 

If/, g are two continuous functions, then/ 4 - j/ is continuous. If r is a 
number, then cf is continuous. The zero function is continuous. Hence 
the continuous functions form a subspace of the vector space of all func- 
tioms. 

If /, g arc two difTerentiable functions, then their sum / r g is differ¬ 
entiable. If c is a number, then cf is dilTerenliablc. The zero function 
is difTerentiable. Hence the difTerentiable functions form a subspace of 
the vector space of all functions. Furthermore, every difTerentiable func- 



108 


VECTOR SPACES 


(IX, §2) 


tion is continuous. Hence the differentiable functions form a subspace 
of the vector space of continuous functions. 

Consider the two functions e\ These generate a subspace of the 
space of all differentiable functions. The function 3c' + 2c^' is an element 
of this subspace. So is the function \/2 c' + 


Exercises 

1. Let V be a vector space. Using the properties VS 1 through VS 8, show 
that if t' is an element of V and 0 is the number zero, then Ov = 0. 

2. Let c be a number and v an element of V. Prove that U cv = 0, then 

y = 0 . 

3. In the vector space of functions, what is the function satisfying the con¬ 
dition VS 2? 

4. Let V be a vector space and v, w two elements of V. If t> + tt> = 0, show 
that w = —v. 

5. Let V be a vector space, and v, xd two elements of V such that v + uj = y. 
Show that w — 0. 


§2. Bases 

Let V' be a vector .space, and let yi,... , i'„ be elements of V. We shall 
say that t-,, ..., are linearly dependent \i there exist numbers oi,. .., 
not all equal to 0 such that 


ajfq -j- .. . -L = Q 

If there do not exist such numbers, then we say that iq,. .., are linearly 
independent. 

Example 1. Lot V = R" and consider the vectors 

E, = (l,0,...,0) 

% 

♦ 

En = (0,0,..., 1). 

Then L’l,. . ., En are linearly independent. Indeed, let ui,..., a„ be 
numbers such thatciL'i +-h anEn = 0. Since 

aiEi + ■ ■ • + anEn = (Oi, . . . , On), 

it follows that all a,- = 0. 

Example 2. Let V be the vector space of all functions of a variable t. 
Let /i(0, - • . ,/n(0 be n functions. To say that they are linearly de- 



(IX, §2] 


BASES 


109 


pendent is to say that there exist n numbers ai,..., not all equal to 0 
such that 

ai/i(0 + --- + an/«(0 = 0 

for all values of t. 

The two functions c', are linearly independent. To prove this, sup¬ 
pose that there arc numbers a, b such that 

ae‘ -f = 0 

(for all values of 1). Differentiate this relation. We obtain 

ae' -f 26e^‘ = 0. 

Subtract the first from tlie second relation. We obtain be‘ = 0, and hence 
6 = 0. From the first relation, it follows that ae‘ = 0, and hence a = 0. 
Hence e\ are linearly independent. 

Consider again an arbitrary vector space V. Ijct Vi,.. ., Vn be linearly 
independent elements of V. Let zj, . .., j„ and y\,... ,yn be numbers. 
Suppose that we have 

XiVi -f--1- XnVn = VlVi H-1- i/nVn. 

In other words, two linear combinations of Wi, ..., t'„ are equal. Then 
we must have Xi = tji for each i = 1,. . ., n. Indeed, subtracting the 
right-hand side from the left-hand side, we get 

- ViVi H-h x„v„ — y^Vn = 0. 

We can write this relation also in the form 

Ul - yi)Vl H-h (jTn - yn)Vn — 0. 

By definition, we mu.st have x, — = 0 for all t = 1,. . ., n, thereby 

proving our a.ssertion. 

If elements vi,.. . , Vn of V generate V and in addition are linearly inde¬ 
pendent, then the set eonsisting of these elements is called a basis of V. 

We .shall also say that the elements ... constitule or form a ba.sis of V. 

A.s a matter of rotation, if 8|,. .., 5„ are objects, then the set con¬ 
sisting of these objects is denoted by {si, . . . , s„}. If elements y,, . . . , i’„ 
of V generate V and arc linearly independent, then we shall say that 
{yj,..., y„) is a ba.sis. 

The vectors Ei ,..., of Example 1 form a basis of R". 

Let \V be the vector space of functions generated by the two functions 
e\ e^‘. Then {e*, is a basis of W. 

lAJt F be a vector space, and let {e,,..., y,;} be a basis of V. The ele¬ 
ments of V can be represented by n-tuples relative to this basis, as follows 



110 


VECTOR SPACES 


[IX, §2) 


If an element y of V is written as a linear combination 

V = XiVi + • • • + X„l'„ 

of the basis elements, then we call (xi,..., Xn) the coordinates of v with 
respect to our basis, and we call x,- the t-th coordinate. 

For example, let V be the vector space of functions generated by the 
two functions c*, Then the coordinates of the function 

3e' + 

with respect to the basis e*, are (3, 5). 

Example 3. Show that the vectors (1, 1) and (—3, 2) are linearly inde¬ 
pendent. 

Let a, b be two numbers such that 

a(l, I) + 6(-3,2) = 0. 

Writing this equation in terms of components, we find 

o — 36 = 0, 
a "I" 26 = 0. 

Tliis is a system of two equations which we solve for a and 6. Subtracting 
the second from the first, we get —56 = 0, whence 6 = 0. Substituting 
in citlier equation, wc find a = 0. Hence a, 6 are both 0, and our vectors 
arc linearly independent. 

Example 4. Find the coordinates of (1,0) with respect to the two 
vectors (!, 1) and (—1, 2). 

We must find numbers a, 6 such that 

a{l,l) + 6(-I,2) = (1,0). 

Writing this equation in terms of coordinates, Ave find 

a — 6 = 1, 
a 4- 26 = 0. 

Solving for a and 6 in the usual manner yields 6 = —^ and a = §. 
Hence the coordinates of (1,0) with respect to (1, 1) and (—1,2) are 

il-h). 

Let (rj,. . ., r„] be a set of elements of a vector space V. Let r be a 
positive integer We .sliall say that {rj,..., tv} is a maximal subset 
of linearly independent elements if iq,..., tv are linearly independent, 



Iix, §2] 


BASES 


in 


and if in addition, given any y,- with i > r, the elements , iv, i-,- 

are linearly dependent. 

The next theorem gives us a useful criterion to determine when a set of 
elements of a vector space is a basis. 

Theorem 1. Lei {v\t..., r„} he a set of generators of a vector space V. 

Let (t'l, . . . , jv} be a maximal subset of linearly independent elements. 

Then {ui,. .. , is a basis of V. 

Proof. We must prove that I'l, , I'r generate V. We shall first prove 
that each i-,' (for i > r) is a linear combination of I'l,.... iv. By hy¬ 
pothesis, given i',, there cxi.st numbers Ji, ... , y not all 0, such that 


Jii’i + •••-{- x,tv -h yvi = 0. 


Furthermore, y ^ 0, becau.se otherwise, we would liave a relation of 
linear dependence for i-j,... , Jv. Hence we can solve for »•,, namely 




-f • • • + 




thereby showing tliat is a linear combination of yj, . . . , r,. 

Next, let V be any clement of V. There exist numbers C|, . . . , Cn such 
that 

V = C,y, + •••-!- Cnt'n. 


In this relation, we can replace each r, (i > r) by a linear combination of 
1 ^ 1 ,... , IV, If we do thi.s, and then collect terms, we find that we have 
exprcs.sed e as a linear combination of t’i,...,iv- This proves that 
i'l. • • •. I'r generate V, and hence form a basis of V. 


Fxerci.se^s 

1. Show that the following vertors are linearly indpi)endent. 

fu) (1, 1, 1) and (0, 1. -1) (h) (1.0) and (1, 1) 

(c) (-1. 1,0) and (0. 1.2) (d) (2. -1) and (1,0) 

(c) (jT.O) and (0, 1) (f) (1, 2) and (1, 3) 

(k) (1. 1,0), (1, 1, l)and (0, I. -1) 

(h) (0. 1, I), (0,2, l)and (1,5,3) 

2. Fx|»ros.s the given vector A' as a linear combination of the given vectors 
.1, li, and find tin* coordinati’s of .V with respect to .1, IS. 

(a) X = (1,0), A = (1, 1). IS = (0, 1) 

(b) A’ = (2, 1), .1 = (1,-1),/y = (1.1) 

(c) X = (I, 1). .1 = (2, 1). IS = (-1,0) 

(d) X = (4,3), .1 = (2.1), IS » (-1,0) 



112 


VECTOR SPACES 


Iix, 82J 


3. Find the coordinates of the vector X with respect to the vectors A, B, C. 

(a) X = (1,0,0), .1 = (1,1,1), B = (-1, 1,0), C = (1,0,-1) 

(b) X = (1, 1,1), A = (0, 1, -1), B = (I, 1,0), C = (1, 0, 2) 

(c) X = (0,0, I), .-I = (1, 1, 1), B = (-1, 1,0), C = (1,0, -1) 

4. Let (a, b) and (c, d) be two vectors in the plane. If ad — be — 0, show that 

they are linearly dependent. If ad — 6c 0, show that they are linearly inde¬ 
pendent. 

5. Consider the vector space of all functions of a variable t. Show that the 
following pairs of functions are linearly independent. 

(a) l,t (b) (, i- (c) t,t* (d) e‘,t (e) (c', (f) sin (, cos/ (g) /, sin i 

(h) sin /, sin 2t (i) cos /, cos 31 

6. Consider the vector space of functions defined for t > 0. Show that 
the following pairs of functions are linearly independent. 

(a) /, 1/1 (b) c‘, log! 

7. What are the coordinates of the function 3sinl+5cosl - fit) with 
respect to the basis {sin t, cos 1} ? 

8. Let D be the derivative d/dt. Let /(I) be as in Exericse 7. What are the 
coordinates of the function D/{t) with respect to the basis of Exercise 7? 

9. Let .li,. . . , .1, be vectors in R" and assume that they are mutually per¬ 
pendicular (i.e. any two of them are perpendicular), and that none of them is 
equal to 0. Prove that they are linearly independent. 

10. Let V be the vector space of continuous functions on the interval (—t, irl. 
If /, g are two continuous functions on this interval, define their scalar product 
</, g) to be 

</,?> = 

Show that the functions sin nl (n = 1,2,3,...) are mutually perpendicular, 
i.e. tliat the scalar product of any two of them is equal to 0. 

11. Show that the functions sin 1, sin 21, sin 31, ..., sin nl are linearly inde¬ 
pendent, for any integer n ^ 1. 



CHAPTER X 


Linear Equations and Bases 

You have met linear equations in elementary school. Linear equations 
are simply equations like 

2x y z = 1, 

5x — y + 7z = 0. 

You have learned to solve such equations by the successive elimination of 
the variables. In this chapter, we shall review the tlieory of such equations, 
dealing with equations in n variables, and interpreting our results from the 
point of view of vectors. Several geometric interpretations for the solutions 
of the equations will be given. 

§i. Matrices 

We consider a new kind of object, matrices. 

Let n, m be two integers ^ 1. An array of numbers 


an 

012 

ai 3 

• • • n^V 

021 

% 

022 

• 

♦ 

O23 

• 

... 

1 

• 

• 

• • • y 


is called a matrix. We can abbreviate the notation for this matrix by 
writing it (a,;-), i = 1,.... m and j = I,... ,n. We say that it is an yn 
by n matrix, or an m X n matrix. The matrix has m rows and n column.'i. 
For instance, the first column is 



and the second row is ( 021 , 022 , • • • , 02 n). We call a,-; the ij-entry or 
ij<ompunenl of the matrix. 

Example 1. The following is a 2 X 3 matrix: 



It has two rows and three columns. 


113 



114 


LINEAR EQUATIONS AND BASES 


IX, §1] 


The rows are (1,1, —2) and (—1,4, —5). The columns are 

(-;)■ (;)• i-t)- 

Thus the rows of a matrix may be viewed as n-tuples, and the columns 
may be viewed as vertical m-tuples. A vertical m-tuple is also called a 
column vector. 

A vector (xi,..., Xn) is a 1 X n matrix. A column vector 



is an n X 1 matrix. 

When we write a matrix in the form (a,-;), then i denotes the row and 
j denotes the column. In Example 1, we have for instance an — 1, 
023 “ —5. 

A single number (a) may be viewed as a 1 X 1 matrix. 

Let (a,;), i = 1,.... m and j = 1,..., n be a matrix. If m == n, 
then we say that it is a square matrix. Thus 



are both square matrices. 

We have a zero matrix, in which a,-,- = 0 for all i, j. It looks like this: 

/O 0 0 ... 0 

0 0 0 ... 0 

Vo 0 0 ... 0 

We shall write it 0. We note that we have met so far with the zero num¬ 

ber, zero vector, and zero matrix. 

We shall now define addition of matrices and multiplication of matrices 
by numbers. 

We define addition of matrices only when they have the same size. 
Thus let m, n be fixed integers ^ 1. Let A = (a.y) and B = (bij) be 
two m X 11 matrices. We define A B to be the matrix whose entry in 
the i-th row and j-th column is a,y -f In other words, we add matrices 
of the same size componentwise. 

Example 2. Ixjt 





[X, §1] 


MATRICES 


115 


Then 


A + B = 



If A, B are both 1 X » matrices, i.e. n-tuples, then we note that our 
addition of matrices coincides with the addition which we defined in 
Chapter I for n-tuples. 

If 0 is the zero matrix, then for any matrix A (of the same size, of 
course), we have 0-\-A = A -\-0 = A. This is trivially verified. 

We shall now define the multiplication of a matrix by a number. Let 
c be a number, and A = (a,j) be a matrix. We define cA to be the matrix 
whose ij-componcnt is ca,j. We write c/1 = (ca,;). Thus we multiply 
each component of A by c. 


Example 3. Let A, B he &s in Example 2. Let c = 2. Then 



For all matrices A, we find that .1 -f ( — 1)^ = 0. 

We leave it as an exercise to verify tliat all properties VSl through VS8 
are satisfied by our rules for addition of matrices and multiplication of 
matrices by numbers. The main thing to observe here is that addition 
of matrices is defined in terms of th<* components, and for the addition of 
components, the conditions analogous to VSl tlirough VS4 are satisfied. 
They are standard properties of numbers. Similarly, VS5 through VS8 
are true for multiplication of matrices by numbers, because the corre¬ 
sponding properties for the multiplication of numbers are true. 

We see that tiic matrices (of a given size m X n) form a vector space, 
which we may denote by OUm.n- 

We define one more notion related to a matrix. Let A = (a,j) be an 
m X n matrix. The n X m matrix B = (hji) such that 5>, = is called 
the Irampoae of A, and is also denoted by ‘A. Taking the transpose of a 
matrix amounts to changing rows into columns and vice versa. If A is 
the matrix which wc wrote down at the beginning of this section, then M 
is the matrix 



021 

031 

• • • ^ml\ 

1 ai2 

022 

032 

• • ' ^m2 \ 

: 

9 

• 


\ • 

• 

« 

• j 

\a,n 

02n 

Osn 

• • • ^mn/ 



116 


LINEAR EQUATIONS AND BASES 


(X, §1J 


To take a special case: 



If A = (2,1, —4) is a tow vector, then 



is a column vector. 


1. Let 


Exercises 




Find A + B, 3B, -2B, A + 2B, 2A + B. A - B, A ~ 2B, B — A. 


2. Let 



Find A + B, 3B, -2B, A + 2B, A — B, B - A. 

3. In Exercise 1, find M and ‘B. 

4. In Exercise 2, find 'A and ‘B. 

6. If A, B are arbitrary m X n matrices, show that *(A + B) = *A + 

6. If c is a number, show that ‘(c/1) = c*A. 

7. If A = (fl.y) is a square matrix, then the elements an are called the 
diagonal elements. How do the diagonal elements of A and ‘A differ? 

8. Find *(A + B) and 'A + 'B in Exercise 2. 

9. Find A + M and B + 'B in Exercise 2. 

10. A matrix A is said to be symmetric if A = *A. Show that for any square 
matrix A, the matrix A + ‘A is symmetric. 

11. Write down the row vectors and column vectors of the matrices A, B 
in Exercise 1. 

12. Write down the row vectors and column vectors of the matrices A, B 
in Exercise 2. 


§ 2 . Homogeneous linear equations 

Let A = (a.y), i = 1,... , m and j = 1,..., n be a matrix. Let 
bi, ... ,b,n be numbers. Equations like 

+ • • • 4 * = bi 

n ! 


Omn^n “ 


IX. §2) 


HOMOGENEOUS LINEAR EQUATIONS 


117 


are called linear equations. We also say that (*) is a system of linear 
equations. The system is said to be homogeneous if all the numbers 
bi,.. . ,bm are equal to 0. The number n is called the number of unknowns, 
and m is the number of equations. 

The system of equations 



QiiJi +-h OinJ-n = 0 

QmiXi I- • • • + CnnXn = 0 


will be called the homogeneous system associated with (*). In this section, 
we study the homogeneous system (**). 

The sy.stem (**) always has a solution, namely the solution obtained 
by letting all x, = 0. This solution will be called the trivial .solution. A 

solution (xi.x„) such that .some x,- is 5 ^ 0 is called non-trivial. 

We shall be interested in the case when the number of unknowns is 
greater than the number of eciuations, and wc shall see tliat in that case, 
there always exists a non-trivial solution. 

Before dealing with the general ca.se, we shall study examples. 

First, suppose that we have a single equation, like 

2x + y - 42 = 0. 

To find a non-trivial solution, we give all the variables e.xcept the first 
a special value 0, say y — 1, 2 = 1. We then .solve for x. We find 
2x — {—y) -1- 42 = .3, whence x = §. 

Next, consider a pair of equations, say 

(1) 2x + 3y — 2 = 0, 

(2) X + y + 2 = 0. 

Wc reduce the problem of solving these simultaneous equations to the 
preceding ca.se of one equation, by eliminating one variable. Thus we 
multiply the second e(}uation by 2 and subtract it from the first etjuation, 
getting 

(3) j, - 32 = 0. 

Now wc meet one equation in more than one variable. We give 2 any 
value 0, say 2 = 1, and solve for y, namely y = 3. We then solve for x 
from the second equation, and obtain x = —4. The values which we 
have obtained for x, y, z are also solutions of the first equation, because 
the first equation is (in an obvious sense) the sum of equation (2) multiplied 
by 2, and equation (3). 



118 


LINEAR EQUATIONS AND BASES 


IX, §2] 


The procedure which we shall use in general is merely the general 
formulation of the elimination carried out above on numerical examples. 

Consider our system of homogeneous equations (**). Let Ai,..., Am 
be the row vectors of the matrix (a,/). Then we can rewrite our equations 
(**) in the form 

Ai-X = 0 

(**) : 

Am'X ^0. 

Geometrically, to find a solution of (*•) amounts to finding a vector X 
which is perpendicular to Ai, ..., Am- Using the notation of the dot 
product will make it easier to formulate the proof of our main theorem, 
namely: 

Theorem 1. Lei 

oiiii -f-h a^riXn — 0 

”1“ ' * * Umn^n “ 0 

he a system of m linear equations in n unknottms, and assume that n > m. 
Then the system has a non-trivial solution. 

Proof. The proof will be carried out by induction (cf. the Appendix), 
i.e. a stepwise procedure. 

Consider first the case of one equation in n unknowns, n > 1: 

+ ■ • • + OnXn = 0 . 

If all coefficients Oi,. . ., a„ are equal to 0, then any value of the variables 
will be a solution, and a non-trivial solution certainly exists. Suppose 
that some coefficient a,- is 9^ 0. After renumbering the variables and the 
coefficients, we may assume that it is oi. Then we give X 2 ,. . . ,Xn arbitrary 
values, for instance we let X 2 = • • • = Xn = 1, and solve for Xi, letting 

Xi = (02 + * • • + Qn). 

Ol 

In that manner, we obtain a nontrivial-solution for our system of equations. 

us now assume that our theorem is true for a system of m ~ I 
equations in more than m ~ 1 unknowns. We shall prove that it is true 
for m equations in 71 unknowns when n > m. We consider the system (*•). 

If all coefficients (a,/) are equal to 0, we can give any non-zero value 
to our variable.'^ to get a solution. If some coefficient is not equal to 0, 
then after renumbering the equations and the variables, we may assume 
that it is ai j. We shall subtract a multiple of the first equation from the 


IX, §2) 


HOMOGENEOUS LINEAR EQUATIONS 


119 


others to eliminate xi. Namely, we consider the system of equations 

% 

# 

which can also be written in the form 

Ai-X AfX = 0 


A„-X Af X = 0. 

ail 

In this system, the coefficient of xi is equal to 0. Hence we may view (•**) 
as a system of m — 1 equations in n — 1 unknowns, and n — 1 > m — 1. 

According to our assumption, wc can find a non-trivial solution 
(X 2 ,... ,Xn) for this system. We can then solve for x, in the first equa¬ 
tion, namely 

= —^ (012^^2 H-+ dlnln)- 

an 

In that way, we find a solution of • A = 0. But according to (***), 
we have 

Ar X = ^ Ai - X 

^11 

for i = 2,. . ., m. Hence /I, • A = 0 for i = 2,. . . , w and therefore 
we have found a non-trivial solution to our original system (**). 

The argument wc have just given allows us to proceed stepwise from 
one equation to twe eijuations, then from two to tliree, and so forth. 
This concludes the proof. 


Exehcises 

1. Let V he a subspace of R". Let IT be the set of elements of R" which are 
perpendicular to every element of V. Show that U' is a subspace of R". 

2. Let /li,..,, Ar be generators of a subspace V of R". If be the set of 

all elements of R" which are perpendicular to Aj . Ar. Show that the vectors 

of W are perpendicular to every element of V. 

3. Interpret the solutions of a homogeneous system of linear equations in 
the light of Exercisc.s 1 and 2. 

4. Consider the inhomogeneous system (*) con.sisting of all A’ such that 
X • Ai ^ 6, for i = 1,..., m. If X and X' are two solutions of this system, 



120 


LINEAR EQUATIONS AND BASES 


IX, 531 


show that there exists a solution Y of the homogeneous system (**) such that 
X' = X + K. Conversely, if A' is any solution of (*), and Y a solution of (**), 
show that A + 1’ is a solution of (*). 

§^. Invariance of dimension 

This section consists of applications of Theorem 1. 

Theorem 2. Let V be a vector space, and let {wi,..., be a basis 
of V. Let uJi, .. . ,Wn be elements of V and assume that n > m. Then 
Wx,. . . ,Wn are linearly dependent. 

Proof. Since {I’l,. .. , I'w) is a basis, there exist numbers (a,;) such that 
we can write 

Wx — aiiKi +-h 

% 

Wn = flinJ’i +-h 

If Xi,. .., x„ are numbers, then 

XiWx + • • • 4- -Tnien 

= (XiOii -h • • • + X„ai„)e;i + • • • -|- (XiOml + ■ • • 4- Xnamn)Vm 

(just add up the coefficients of .. . , tJ„ vertically downwards). Accord¬ 
ing to Theorem 1, the system of equations 

•TjOll + • • • 4- XnUln = 0 
• 

% 

% 

X 4" ' ’ ’ 4” Xnflnjn ^ 0 

has a non-trivial solution, because n > m. In view of the preceding 
remark, such a .solution (ji,. .., in) is such that 

« 

XiWi 4-h InWn = 0, 

as desired. 

Thkohem 3. Let V be a vector space and suppose that one basis has 
n ele7nctits, and another basis has m elements. Then m = n. 

Proof. We apply Theorem 2 to the two bases. Theorem 2 implies that 
botli alternatives n > m and m > n are impossible, and hence m = n. 

U't V' be a vector space having a basis consisting of n elements. * We 
shall .say that n is the dimension of I'. If V consists of 0 alone, then V 
does not have a basis, and we shall say that V has dimension 0. 

We shall now give criteria which allow us to tell when elements of a 
vector .space constitute a basis. 



[X, §31 


INVARIANXE OF DIMENSION 


121 


Let Vi, . .., be linearly independent elements of a vector space V. 
We shall say that they form a maximal sel of linearly independent elements 
of V if given any element u> of V, the elements w, t’l,. .., t’„ are linearly 
dependent. 

Theorem 4. Let V be a vector space, and (j-j,.. . , I'n} a maximal set 
of linearly independent elements of V. Then (t’j, .. . , i'„} is a basis of V. 

Proof. We must show that t'l,generate V, i.e. that every ele¬ 
ment of V can be expressed as a linear combination of i-i,. . . , Let le 
be an element of V. The elements w, j-i, of K must be linearly de¬ 

pendent by hypothesis, and hence there exist numbers .ro, jti, . . . , j„ not 
all 0 such that 

XqW + + • • • + Inl'n = 0. 

We cannot have xq — 0, because if that were the case, we would obtain a 
relation of linear dependence among I'l,.... »•„. Therefore we can solve 
for w in terms of vj,. . . , Vn, namely 


JTo -^0 

This proves that te is a linear combination of I’l,.. . , i’„, and hence that 
{i'l,. .., t>„} is a basis. 

Theorem 5. Let V be a vector space of dimension n, and let I’j, 
be linearly independent elements of V. Then I’l, . . . , constitute a basis 
ofV. 

Proof. According to Theorem 2, {e,, .... »•„} is a maximal set of linearly 
independent elements of V. Hence it is a basis by Theorem 4. 

Theorem G. Let K be a vector space haviny a basis consi.'iting of n 
elements. IM W be a snbspace which does not consist of O alone. Then IL 
has a basis, and the dimension of W is ^ n. 

Proof. Let Wj be a non-zero element of IT. If (tei) is not a maximal set 
of linearly independent elements of IT, we can find an element u >2 of IT 
such that u>j, W 2 are linearly independent. Proceeding in this manner one 
element at a time, there must be an integer m ^ n such that we can find 
linearly independent elements u>i, 1 ^ 2 , . . . , and such that {uij, . . . , w„} 
is a maximal set of linearly independent elements of IT (by Theorem 2, 
we cannot go on indefinitely finding linearly independent elements, and 
the number of such elements is at most n). If we now use Theorem 4, we 
conclude that , w„} is a basis for IT. 




122 


LINEAR EQUATIONS AND BASES 


[X, §41 


§ 4 . Orthonormal bases 

We return to the notion of scalar product. Now that we know what a 
vector space is in general, we shall generalize our notion of scalar product 
to apply to arbitrary vector spaces. 

Let F be a vector space. A scalar 'product on F is a rule which to any pair 
of elements v, xd oi V associates a number, denoted by {v, w), satisfying 
the following properties: 

SP 1. IFe have (y, uj) = {w, v). 

SP 2. If u, V, w are elements of F, then 

{u, y + u;) = (u, y> + (u, uj>. 

SP 3. If X is a nwm6er, then 

(xu, v) = x<u, y) and <u, xv) = x(u, v). 

SP 4. // y = 0 then (y, y) = 0, and otherwise, {v, v) > 0. 

(It is actually customary to say that the scalar product is positive definite 
bccaiisc of property SP 4, but we shall not deal with any other type, and 
hence simply speak of a scalar product.) 

The definitions of Chapter I, §4 and the properties proved there apply 
to arbitrary scalar products. You .should now verify this in detail. Forin- 
sta^ we can define the norm [|y!l of an element of F, by letting |ly|j == 
V(y, y) and the following three properties hold: 

For all (■ in F, we have \\v\\ ^ 0, and = 0 if and only if y = 0. 

For any number x, we have [|xy|l = lx|||y||. 

For any elements e, w of F, we have ||y + ty|| ^ |[y|| -|- |[ty||. 

\\ e also define two elements v, w of F to be perpendicular, or orthogonal, 
if (/', w) = 0. 

1 he notatiiui {v, w) is u.«ed because in dealing with vector spaces of 
functions, it might be confu.sing to write f-g for the scalar product (i.c. 
thi> might be confus(‘d with the ordinary product of functions). However, 
in dealing witii abstract vector spaces, there is a certain simplicity about 
the dot notation, and hence we shall sometimes write y • w instead of 
writing (i\ w). 

We shall also refer to elements of a vector space as vectors. If F is a 
vector space with a scalar product, and v is an element of V, then we say 
that y Ls a unit vector if |lyl[ = 1 (or equivalently, if (y, v) — 1). 

l or the rest of this section, we let F be a vector space with a scalar 
product. A basis (cj,. .., y„} of 1 is said to be orthogonal if its elements 
are mutually perpendicular, i.c. if y,- • Vj = 0 whenever i j. If in addi- 


(X. §4] 


ORTHONORMAL BASES 


123 


tion each element of the basis has norm 1, then the basis is called or- 
Ihonormal. 

The unit vectors Ei, of R" form an orthonormal basis of R”. 

We shall see below that any subspace V of R" which does not consist of 0 
alone has an orthonormal basis. 

Theorem 7. Let V he a vector space with a scalar product. Let n be the 
dimension of V, and assume n > 0. Let W be a subspace of V, and let 
{wi, . . . , be an orthonormal basis of W. If W V, then there exist 
elements w^+i, ... ,Wn of V such that {wi,..., ten} is an orthonormal 
6am of V. 

Proof. We shall proceed inductively according to the following pattern. 
Suppose that we have found elements tej,. .. , le, of T which are mutually 
perpendicular and of norm 1. Let Vr be the subspace of V' generated by 

uji,. . . , Wr. If Vr is not all of V, let v be an element of V which does not 

He in i.e. v is not a linear combination of lei, . .., te,. We subtract 
from V its projections on Wi,. .. ,Wt. In other words, let 

Cl = e • Wi, . . . , Cr = f • Wr. 

Let V' = V — Citi»i — ... — CrWf. Then v' is perpendicular to Wi, . . . , Wr 
because for any integer i such that I ^ t ^ r we have 

v' ‘ Wi = V ’ Wi — c,w, • w, = c, — c, = 0. 

Furthermore, v' ^ 0 (otherwise v would lie in F,). I>et 


Then tCr+i has norm 1, and i.s perpendicular to every element Wi, . . . , Wr, 
hence perpendicular to every element in V,. 

According to Theorem 2, we cannot continue the above procedure in¬ 
definitely, and there is an integer r such that r ^ n and F, = F. Then 
{tei. Wr} is an orthonorinal ba.sis of F. 

CoROLLAKV. Let V be a vector space with a scalar product. Let n be the 
dimension of V and assume that n > 0. Then V /ta.s an orthonormal basis. 

Proof. By hypothesis, there exists an element e of F such that v 7 ^ 0 
We let 


V 



Then ||u>,|l =1. We let IF be the space generated by tej, and we apply 
the theorem to get the desired basis. 




124 


LINEAR EQUATIONS AND BASES 


IX, §4] 


Example 1. Find an orthonormal basis for the vector space generated 
by the vectore (1, 1, 0, 1), (1, —2, 0, 0), and (1, 0, —1, 2). 

Let us denote these vectors hy A, B, C. Let 



B- A 
A • A 



In other words, we subtract from B its projection along A. Then B' is 
perpendicular to yl. We find 


B' = m -5,0,1). 

Now we subtract from C its projection along A and B', and thus we let 



C - B' 
B'B' 



Since A and B' are perpendicular, taking the scalar product of C' with 
A and B' shows that C' is perpendicular to both A and B'. We find: 



The vectors A, B', C' are non-zero and mutually perpendicular. They lie 
in the space generated by vt, C. Hence they constitute an orthogonal 
basis for that space. If we wish an orthonormal basis, then we divide 
these vectors by their length, and thus obtain 



(1,L0,1), 


__ 1 _ 

ill'll ^57 




(-4, -2,-1,0, 


as an orthonormal basis. 

In the proof of Theorem 7, we divided the vectors at each step by their 
norm. If we wish, we may postpone this step till the end. For instance, 
suppose 7 is a vector space with a scalar product, and we have found 
non-zero elements I'j,. . ,, tv wliich are mutually perpendicular. Let i> be 
another element of V. To "orthogonalize” this new element, we subtract 
from it its projections on t;i, ... , y,. Thus we let 


Then y is perpendicular to I’l,. ,. , Vr, as one sees at once by taking the 
scalar product with these. Then either v' = 0, in which case v is in the 
space generated by I'l, , . . , Vr, or v' ^ 0, in which case we have r + 1 
linearly independent mutually perpendicular elements Vi, ..., tv, v'. If 




(X. 54] 


ORTHONORMAL BASES 


125 


divide each one of these by its norm, then we obtain vectors of norm 1, 
which are again mutually perpendicular. 

Let F be a vector space with a scalar product. Lot iS be a subset of V. 
Let U be the set of elements of V which are perpendicular to every ele¬ 
ment of jS. Then you will verify easily (as an exercise) that [/ is a sub-^ipace 
of V. In practice, we shall take 5 to be a subspace. An element of V which 
is perpendicular to every element of S is also said to be perpendicular to S. 

Theorem 8. Lei V be a vector space with a scalar product, of dimension 
n. Let W be a subspace of V of dimension r. Let U be the subspace of V 
consisting of all elements which are perpendicular to IF. Then U has 
dimension n — r. 

Proof. If IF consists of 0 alone, or if If' = V, then our assertion is ob¬ 
vious. We therefore assume that IF V and that IF 9^ {0}. Let 
{lOi,. . . , Wr) be an orthonormal basis of IF. By Theorem 7, there exist 
elements Ur+i, Un of F such that 

{U)1, . . . ,Wr, Ur+1, . . . , u„} 

is an orthonormal ba.sis of V. We shall prove that {Wr+i, is an 

orthonormal basis of U. 

Ix;t u be an element of U. Then there exist numbers xj,. . . , such that 


U — XiWi + • • * 4- XrWr + Jr+lMf+i 4 ' • * 4 

Since u is perpendicular to IF, taking the dot product with any 
tVi (i = 1, . . . , r), we find 

0 = U- Wi = Xi{Wi • Wi) = Xi. 

Hence ail x, = 0 (i = 1,..., r). Therefore u is a linear combination of 

t/f ^ I ) * • • 9 

Conversely, let u = x,+ iu,+ i 4 ■ ■ ■ 4 x„u„ be a linear combination of 

. . Un. Taking the dot product with any u>, yield.s 0. Hence u is 

perpendicular to all Wi {i = I,, r), and hence is perpendicular to IF. 
Thi.s proves that generate U. Since they are mutually 

perpcmdicular, and of norm 1, they form an orthonormal basi.s of V, whose 
dimension is therefore n ~ r. 


Example 2. Theorem 8 has an interesting interpretation in terms of 
linear equations. Let A,,. . . , be row vectors in R". Let X = 
(^ 1 . • • •, x„) as usual. The set of solutions X of the homogeneous system 
of linear equations 




A,.X = 0. 


• • • 


, A„‘X = 0 



126 


LINEAR EQUATIONS AND BASES 


IX, §51 


is a vector space. In fact, let W be the space generated by il i,..., 
Then the space C/ consisting of ail vectors perpendicular to Ai ,.,., Am is 
precisely the vector space of solutions of (**). The vectors A\y ..., An 
may not be linearly independent. However, if r is the dimension of W, 
then we may now say that the space of solutions has dimension n r. 
Note that r ^ m. The dimension of U is called the dimension of the space 
of solutions of the system of linear equations. 

Let 6i,..., 6m be numbers, and consider once more the inhomogeneous 
system of linear equations 

Ai-X = bi 

(*) i 

An ‘ X —• 6m» 

It may happen that this system has no solution at all, i.e. that the equa¬ 
tions arc inconsistent. For instance, the system 

2x + 3i/ — z — 1, 

2x + Sy — z = 2 

has no solution. However, if there is at least one solution, then all solu¬ 
tions are obtainable from this one by adding an arbitrary solution of the 
associated homogeneous system (**) (cf. Exercise 4 of §2). Hence in this 
case again, we can speak of the dimension of the set of solutions. In the 
next section, we shall give a criterion which guarantees us the existence 
of at least one solution. 

Example 3. Consider R^. Let A, B be two linearly independent vectors 
in R^. Then the space of vectors which arc perpendicular to both A and 

is a I-dimensional space. If {.V} is a basis for this space, any other basis 
for this space is of type , where (is a number ^ 0. 

Again in R^, let N he a non-zero vector. The space of vectors perpendic¬ 
ular to A is a 2-dimcnsional space, i.e. a plane, passing through the origin 0 . 

Exeiicises 

1. Let V be a vector space with a scalar product. Show that {0, i») = 0 for 
every element r of V. 

2. Let V be a vector si>ace with a scalar product, and let ci, ..., v, be non¬ 
zero elements of T which are njulually perpendicular. Show that they are 
linearly independent. 

3. Let V be as in Exercise 2, and lot u’l.U'« be elements of V. Let IK 

be the subsiiacc generated by w\, ..., ir„. Show that a vector v perpendicular 
to each Wi is also pcr|)endicular to IK. 

4. What is the dimension of the subspace of R** perpendicular to the two 
vectors (I, 1, —2, 3, 4, 5) and (0, 0, 1, 1,0, 7)? 



(X, §5] 


A GEOMETRIC INTERPRETATION 


127 


5. Let V be a vector space with a scalar i>ro<luct and let IV be a subspace, 
ir ^ V. Show that there exists a non-zero element of T which is i)er[)endicular 
to ir (assuming that the dimension of 11' is finite). 

6. Find an orthonormal basis for the subspaces of R'^ generated by the 
following vectors: (a) (1, 1, —1) and (1,0, I), (b) (2, 1, 1) and (1,3, —1). 

7. Find an orthonormal basis for the subspace of R"* generated by the vectors 
(1,2, 1.0) and (1.2. 3,1). 

8. Find an orthonormal basis for the subspace of R"* generated by (1, 1,0,0), 
(1,-1 1,1), and (-1,0. 2,1). 

In the next exercises, we consider the vector space of continuous functions on 
the interval 10, 1). We define the scalar i)roduct of two such functions/, g bv the 
rule 

is.g) = C am) di. 

Jfi 

9. Let V be the subspace of functions generated by the two funetion.s 
/(O = t and git) - t'-. Find an orthonormal basis for T. 

10. Let V be the subspace generated by the three function.s 1, t, 1‘ (where 1 
i.s the constant function). Find an orthonormal basis for V'. 

11. What is the dimension of the space of solutions of the following systems 
of linear equations: 

(a) 2x - 3j/ + z = 0 (b) 2 j + 7y = 0 

x-\- y ~ z = 0 X - 2y + z = 0 

(c) 2x - 3y + z = 0 (d) X + y -F z = 0 

x+y-z = 0 2x-j-2y-f-2z = 0 

3x 4- 4y =s 0 
5x + y + 2 = 0 

12. Let /I be a non-zero vector in n-s{)ace. Let i-* be a point in n-space. 
What is the dimension of the set of solutions of the e(juation 

X-A = P-A7 

13. Let A, B be two linearly independent vectors in n-space. What is the 
dimension of the space peri>cndicutar to both A and B? 

85. A geometric interpretation 

Let A = (Oi/), i = 1, .... n and j = I, . .. , m be a matrix, and let 
• • •, be numbers. Con.sider the .system of linear eijuations 

O] |Xi + • • • -f- ainX„ = bi 

(•) : 

• 

♦ 4* a„„x„ = 

Let X be the vector (z,,.. .,Xn) and let A\ . . . , .1" be the column 



128 


LINEAR EQUATIONS AND BASES 


[X, 55] 


vectors of the matrix A. Let B be the column vector 



To say that X is a solution of the system of linear equations is to say that B 
is a linear combination of the vectors A',, A”, namely 

XiA^ + XzA^^ + XnA’^ = B, 

or written in full: 



Theorem 9. Assume that m = nin the system (*) above, and that the 
vectors A*, . . ., A" are linearly independent. Then the system (*) has a 
solution, and this solution is unique. 

Proof. From Theorem 5, we know that {A \ , A"} is a basis of R". 

Hence any vector can be expressed as a unique linear combination of these 
basis vectors, as contended. 

For simplicity we shall also denote the system (*) by AX = B. 

Exercises 

1. Let A be a square n X n matrix and let .4X = 0 be the associated system 
of n linear equations in n unknowns. Prove that if the column vectors of A are 
linearly independent then the only solution is the trivial solution. 

2. Let A be a square matrix again. Prove that if the row vectors of A are 
linearly independent, then the only solution of the system of equations 
AX = 0 is the trivial solution. 



CHAPTER XI 


Linear Mappings 

We shall first define the genera! notion of a mapping, which generalizes 
the notion of a function. Among mappings, the linear mappings are the 
most important. A good deal of mathematics is devoted to reducing 
questions concerning arbitrary mappings to linear mappings. For one 
thing, they are interesting in them.selves, and many mappings are linear. 
On the other hand, it is often possible to approximate an arbitrary mapping 
by a linear one, whose study is much easier than the study of the original 
mapping. (Cf. Chapter XIII.) 

§/. Mappings 

Lot S, S' be two sets. A mapping from .S’ to S' is a rule which to every 
element of S associates an element of S'. Instead of saying that F is a 
mapping from S into S', we shall often write the symbols F:S —* S'. \ 
mapping will also be called a map, for the sake of brevity. 

A function is a special type of mapping, namely it is a mapping from 
a set into the set of numbers, i.e. into R. 

We extend to mappings some of the terminology we have u.sed for func¬ 
tions. For instance, if T ‘.S —» 5' is a mapping, and if m is an element of .S’, 
then we denote by Tiu), or Tu, the clement of .S' a.^sociated to u by T. 
We call T{u) the value of T at u, or also the image of u under T. The 
symbols ^(u) are read “T of u’\ The set of ail elements T{u), when u 
ranges over all elements of S, is called the image of T. If W is a subset 
of jS', then the set of elements T{w), when w ranges over all elements of IT, 
is called the image of IF under 7’, and is denoted by FflF). 

Example}. Let 6’and .S'be both equal to R. Let/:R ^ R be the func¬ 
tion f(x) — (i.e. the function whose value at a number x is j^). Then 

/ is a mapping from R intb R. 

Example 2. Let S be the set of numbers ^ 0, and let S' = R. Lot 
g:S —» S' be the function such that g{x) = x^'^. Then ^ is a mapping 
from iS’ into R. 

Example 3. Let S be the set of functions having derivatives of all orders 
on the interval 0 < I < 1, and let 6" = .S’. Then the derivative/> = d/(/( 
is a mapping from S into S. Indeed, our rule D as.sociates the function 
df/dt = Df to the function /. According to our terminology, Df is the 
value of the mapping D at/. 


]29 



130 


LINEAR MAPPINGS 


[XI, §1] 


Example 4. Let S be the set of continuous functions on the interval 
[0, I) and let S' be the set of differentiable functions on that interval. We 
shall define a mapping 5 :5 -» S' by giving its value at any function / in S. 
Namely, we let df (or S{j)) be the function whose value at x is 

rm dl. 

JQ 

Then ii{j) is a differentiable function. 

Example 5. Let S be the set R'*, i.e. the set of 3-tuples. Let A = 
(2, 3,-1). Let Z/'.R^—»R be the mapping whose value at a vector 
X == (x, y, z)is A‘ X. Then L{X) = A ■ X. U X = (1, 1, -1), then 
the value of L at X is 6. 

Just as we did with functions, we describe a mapping by giving its 
values. Thus, instead of making the statement in Example 5 describing 
the mapping L, we would also say: Let L:R* —» R be the mapping L(X) = 
A • X. This is somewhat incorrect, but is briefer, and docs not usually 
give rise to confusion. 

Example G. Let F:R^ —♦ R^ be the mapping given by 

F{x, y) = (2i, 2y). 

Describe the image under F of the points lying on the circle x® -h = 1. 
Let (.r, y) be a point on the circle of radius 1. 

Let « = 2x and v = 2y. Then m, y satisfy the relation 

(»/2)2 + {v/2f = 1 

or in other words, 



Hence (w, y) is a point on the circle of radiu.s 2. Hence the image under F 
of the circle of radius 1 is a subset of the circle of radius 2. Conversely, 
given a point {u, v) such tliat 

4- y^ = 4. 

lot X = u/2 and y — v/2. Then the point (x, y) satisfies the equation 
3-2 4 hence is a point on the circle of radius 1. Furthermore, 

y) — («. y)- Hence every point on the circle of radius 2 is the image 
of some point on the circle of radius 1. We conclude finally that the image 
of the circle of radius 1 under F is precisely the circle of radius 2. 

Note. In general, let S, S' be two sets. To prove that S = S', one 
frecjuently proves that S is a subset of S' and that S' is a subset of S. This 
is wiiat we did in the preceding argument. 



(XI, §1) 


MAPPINGS 


131 


Example 7. Let 5 be a set. A mapping from S into R will be called 
& function, and the set of such functions will be called the set of functions 
defined on S. Let/, g be two functions defined on S. We can define their 
sum just as we did for functions of numbers, namely / + 9 is the function 
whose value at an clement f of S is /(/) + j/(0. We can also define tlie 
product of / by a number c. It is the function whose value at I is cJH). 
Then the set of mappings from 5 into R is a vector space. 

Example 8 . Ixt 5 be a set and let V be a vector space. Let F, 0 be 
two mappings from S into V. We can define their sum in the .'^arne way 
as we defined the sum of functions, namely the sum F G h the mapping 
whose value at an element / of 5 is F{i) + G{1). We also define the product 
of F by a number c to be the mapping whose value at an element t of S 
IS cF{l). It is easy to verify that conditions VSl through VS 8 are satisfied. 

Example Si. Let F:R—»R'' be a mapping. For each number t, the 
value of F at f is a vector F(/). The coordinates of F{t) depend on t. Hence 
there are functions/,, ...,/„ such that 

nt ) = (/,(0 _/„( 0 ). 

Lach /, is a function from R into R. These functions arc called the 
courdinaic functions of F. 

Ijot G:R -♦ R" be another mapping from R into K", and let j/i. . . . , tu 
be its coordinate functions. Then 


Then 


^’(0 = {{/ i ( 0 , •. •, {/..( O )- 


{/' i- G)(t) ~ F{t} + 6'(/) = (/,(/) -f ..., 4- 

and for any number c, 

icF)(t) = cF(t) = (r/,C0_ cf„(/)). 

If all tlie functions /,.... ,/„ are dilTerenliable, then we .say tiiat (he 
mapping F above is differrrtliablc. 'J he .set of ail dilTerentiable ma|)pings 
from R into R" is a sub.space of the vector space of all mappings. 

Kxkkcisks 

1. In Ivxarn|)]c 3, kivp I)f wUvtx /is the function: 

fix) = sin J (1,) fU) = e' (v) fix) = log X 

2. In I-.\;ini|)li- 4. give d(f) when/is the function: 

('■') (*■) fix) = co.sx 



132 


LINEAR MAPPINGS 


IXI, §2) 


3. In Example 5, give L{X) when A' is the vector: 

(a) (1,2.-3) (b) (-1,5,0) (c) (2,1,1) 

4. Let F:R —»■ be the mapping such that F(t) = (e', 0- ^'hat is F(l), 

F(0), F(-l)? 

5. Let (7:R ^ R^ be the mapping such that G{t) = (I, 2t). Let F be as in 
Exercise 4. \Miat is (F + G)(l), (F -f- G)(2), (F + G)(0)? 

6. Let F be as in Exercise 4. What is {2F)(0), (tF)(1)? 

7. Let .4 = (I, 1, —1, 3). Let F:R^ R be the mapping such that for 

any vector A' = in, Z 2 . xs, xa) we have F(X) = -Y- .4 -r 2. What is the 
value of F(A') when (a) X = (1, 1,0, —1) and (b) A" = (2, 3, 1, 1)? 

(In Exercises 8 through 12, refer to Example 6. In each case, to prove that 
the image is equal to a certain set S, you must prove that the image is contained 
in 5, and also that every element of S is in the image.) 

8. Let F:R-R“ be the mapping defined by F(j, t/) = (2 j, 3y). Describe 

the image of the points lying on the circle = 1. 

9. I,et F;R“ —» R^ be the mapping defined by Fix, y) = (xy, y). Describe 
the imago under F of the straight line x = 2. 

10. liOt F be the mapping defined by F(x, y) = (e* cos y, c* sin y). Describe 
the image under F of the line x = 1. Describe more generally the image under 
F of a lino x ~ c, where c is a constant. 

11. Let F be the mai»ping defined by F(t, u) = (cost, sin I, u). Describe 
geometric-ally the image of the (/, u)-plane under F. 

12. I/et F be the ma|)ping defined by F(x, y) = (x/3, y/4). What is the 
image under F of the ellipse 

2 2 

9 10 

§2. Linear mappings 

I>et V be two vector spaces. A linear mapping 

7';r -> r 

is a mapping which satisfies the following two properties. First, for any 
elements n, v in V, we have 

LMl. Tin + r) = Tiu) + Tiv). 

Secondly, for any number c, we have 

LM2. Ticu) = cF(u). 

Example 1. Let V be the set of functions which have derivatives of all 
orders. Then the derivative D: F—» F is a linear mapping. This is simply 



IXI, §2] 


LINEAR CLIPPINGS 


133 


a brief way of summarizing properties of the derivative which we have 
known a long time, namely 

DU + ^) = Z)/ + Dg, 

D{cS) = cDU). 

Example 2. Let F = be the vector space of vectors in 3-space. 
Let V' = be the vector space of vectors in 2-space. We can define a 
mapping 

F:R^ -»R2 

by the projection, namely F{x, y, z) = (x, y). We leave it to you to check 
that the conditions LMI and LM2 are satisfied. 

Example 3. Let A = (I, 2, -I). Let F -= R^ and F' = R. We can 
define a mapping L = La :R ’ —» R by the rule 

L(X) = X-A 

for any vector X in 3-space. The fact that L is linear summarizes two 
known properties of the scalar product, namely, for any two vectors X, 
Y we have • 

(X + Y) ■ A = X • A Y ■ A 
icX) • A = c(.Y ■ A). 

Example 4. Let F be any vector space. The mapping whicli associates 
to any element u of F this clement itself is obviously a linear mapping, 
which is called the identily mapping. We denote it by Id or simply I. 
Thus Id{u) = u. 

Example 5. Let F, V' be any vector spaces. The mapping which as¬ 
sociates the clement 0 in V to any element u of F is called the zero mapping 
and is obviou.sly linear. 

Example 6. Let F, F' be two vector spaces. We consider the set of all 
linear mappings from F into F', and denote this set by £. We shall 
define the addition of linear mappings and their multiplication by num¬ 
bers in such a way as to make £ into a vector space. 

Let T\V —> F' and let 7^;F —» F' be two linear mappings. We define 
their sum 7’ -f F to be the map whose value at an element u of F is 
T{u) -{- F{u). Thus we may write 

(r + F)(a) = T-fa) + F(w). 

The map T F is then a linear map. Indeed, it is easy to verify that 



134 


LINEAR UAPPIN6S 


[XI, §2] 


the two conditions which define a linear map are satisfied. For any ele¬ 
ments u, V of V, we have 

{T + F)iu + v) = T{u + w; 4- F(u -f- v) 

= T{u) -4- T(v) + F(u) + F(v) 

= T{u) F{u) + T{v) + F{v) 

= (T + F)(u) + (r + F){v). 

Furthermore, if c is a number, then 

(7* 4- F)(cu) = T(cu) + F{cu) 

= cTiu) 4- cF(u) 

= emu) + F{u)] 

= c[(r-hF)C«)]- 

Hence T 4 F is a linear map. 

If a is a number, and T: 7 —> F' is a linear map, we define a map aT 
from V into V* by giving its value at an clement u of V, namely (aT){u) — 
aT{u). Then it is easily verified that aF is a linear map. We leave this as 
an exercise. 

We have just defined operations of addition and multiplication by 
numbers in our set £. Furtliermore, if T: 7 —> FMs a linear map, i.e. an 
clement of £, then we can define —T to be (—1)7, i.e. the product of 
the number —1 by T. Finally, we have the zero-map, which to every 
element of 7 associates the element 0 of 7'. Then £ is a vector space. 
In other words, the set of linear maps from 7 into 7' is itself a vector 
space. The verification that the rules VSl through VS8 for a vector 
space arc satisfied are easy and left to the reader. 

Example 7. liCt 7 = 7' be the vector space of functions which have 
derivatives of all orders. Let D be the derivative, and let Id be the identity. 
If / is in 7, then 

(D4 7d)/= Z)/4-/. 

Thus, when J{x) = e*, then (Z> 4- Id)f is the function whose value at x 
is e* 4 e' = 2fc*. 

If/(x) = sinx, then ((D 4 /d)/)(x) = cos X 4 sin x. 

We note that 3 ■ /d is a linear map, whose value at / is 3/. Thus 
(Z> 4 3 ‘ Id)f = Z)/ 4 3/. At any number i, the value of (Z) 4 3 * Id)f 
is Dj{x) 4 3/(x). 

Instead of writing D 4 3 • Zd, it is customary to abbreviate the nota¬ 
tion, ai^d write Z) 4 3. Thus we would write 


{D 4 3)/(x) = DJ{x) 4 3/(x). 



IXI. 12] 


LINEAR MAPPINGS 


135 


Let T: K ^ F' be a linear mapping. Let u, v, w be elements of V. Then 

TCu + y + w) = Tiu) + Tiv) + T(w). 

This can be seen stepwise, using the definition of linear mappings. Thus 

T{u-\-v + w) = T{u + y) + T{w) = T{u) + Tiv) + r(ty). 

Similarly, given a sum of more than three elements, an analogous property 
is satisfied. For instance, let wi,be elements of F. Then 

Tiui + ■•■ + «„) = T(W|) + • • • + T{u„). 

Tlie sum on the right can be taken in any order. A formal proof can easily 
be given by induction, and we omit it. 

If Oi,. .., a„ are numbers, then 

+ • ■ ■ + OnWr.) = + • • ■ + an7’(!(n). 

We show this for three elements. 

T(aiu + a2y + Oaiy) = ^(oiw) + 7’(a2y) + Tia^w) 

= aiTiu) + a2?'{y) + asTiw). 

The next theorem will show us liow a linear map is determined when 
we know its value on basis elements. 

Tiikorem 1. Let V and \V be vector spaces. Let {«>!,. . ., |i„} be a basis 
of V, and let Wi, . . . ,w„ be arbitrary elements of W. Then there exists a 
unique linear mapping 7':V \V suck that Tivi) = Wj, . . . , T{vn) = 
Wn- If Xi, . . . , Xn are numbers, then 

+ • ■ • + J*„yn) = XiWi + ■ ■ ■ + X„tVn. 

Proof. We shall prove that a linear map T satisfying the required con¬ 
ditions exists. Ijct y be an element of F, and let ii, . . . , be the unique 
numbers such that y = z,y, +•••-!- x„v„. We let Tiv) = xiie, + 

• • • + x„Wn. We then have defined a mapping T from F into W, and 
we contend that T is linear. If y' is an element of F, and if y' = y[v^ + 

• • • + Vni'n, then 

y + y' = (xi + yi)vi + • • • -1- 

By definition, we obtain 

Tiv 4- y') = (x, 4 - yi)u}i -f ■ ■ . 4- (x„ }Jn)Wn 
= XiWi -t yiWi + x„ui„ + y„Wn 

= Tiv) -}- Tiv'). 




136 


LINEAR MAPPINGS 


[XI, §2) 


Let c be a number. Then cv = cxtVi + • • • + cxnv„, and hence 

T(cv) = cxiWt + • • • 4- cx„w„ = cT(v). 

We have therefore proved that T is linear, and hence that there exists a 
linear map as asserted in the theorem. 

Such a map is unique, because for any element zii>i -b • ■ • + x„y„ of 
V, any linear map F:V W such that F{vi) = Wi {i = 1,.. ., n) must 
also satisfy 

F(XlVl +-h = IlF(ri) H-H Xnftl'n) 

= XiWi 4 - . . . + XnWn. 

This concludes the proof. 


Exercises 

1. Determine which ones of the following mappings F are linear. 

(a) f :R^ —+ defined by F(x, y, z) = (x, z). 

(b) F:R^ R^ defined by F(X) - X. 

(c) F:R3 -» R^ defined by F(.\') = X + (0, -1, 0). 

(d) F:R^ -♦ R^ defined by F(x, y) = (2x + y, y). 

(e) F:R’ —» R" defined by F(x, y) — (2x, y — x). 

(f) F :R^—» R" defined by F(x, y) = iy,x). 

(g) F;R" —» R defined by F{x, y) = xy. 

(h) Let U be an open subset of R^, and let V be the vector space of 
differentiable functions on U. Let V' be the vector space of vector 
fields on V. Then 

grad: V V' 

is a map|)ing. Is it linear? 

2. Let T :V —* 11 be a linear map from one vector space into another. Show 
that T{0) = 0 . 

3. Let r be as in Exercise 2. Let u, e be elements of V, and let Tu = w. 
If Tv = 0, show that T{u + v) is also equal to w. 

4. Determine all elements z of V such that Tz — w. 

5. Let F; 7 -+ U' be a linear map. Let v be an element of V. Show that 

r(-t.) = -Tiv). 

6 . Let V be a vector space, and f:V—tR,g:V—*R two linear mappings. 
Let F:V R^ be the mapping defined by F(i’) = (/(e), y{iO)- Show that F 
is linear. Generalize. 

7. Lot V, ir be two vector spaces and let F:V —* 17 be a linear map. lot 
U be the subset of V consisting of all elements v such that F{v) = 0. Prove 
that U is a subspace of IL 

8 . W'hich of the mappings in Exercises 4, 7, 8, 9 of §1 are linear? 




(XI. §2) 


LINEAR MAPPINGS 


137 


9. Let f :R^ —» R"* be a linear map. Let P be a point of R^, and .1 a non¬ 
zero element of R^. Describe the image of the straight line P + M under F. 
(Distinguish the cases when F{A) = 0 and 5^ 0.] 

Let V be a vector space, and let v\, V 2 be two elements of V which are linearly 
independent. The set of elements of V which can be written in the form <it’i + 
/ 2 i ’2 with numbers t 2 satisfying 0 ^ <i g 1 and 0 ^ ^ 1, is called a 

parallelogram, spanned by wi, V 2 . 

10. Let V and IF be vector spaces, and let F'.V —* U’ be a linear map. Let 
vi, 112 be linearly independent elements of V, and assume that F(vi), f(t' 2 ) are 
linearly independent. Show that the image under F of the parallelogram spanned 
by vi and V 2 is the parallelogram spanned by /’’(t'l). Piv 2 )- 

11. Let F be a linear map from R^ into itself such that 

= 1) and ^’(^ 2 ) = (-1,2). 

Let S be the square whose corners are at (0. 0), (1, 0), (1, 1), and (0, 1). Show 
that the image of this square under is a parallelogram. 

12. Let A, B be two non-zero vectors in the plane such that there is no con¬ 
stant c ^ 0 such that B = cA. Let 7* be a linear mapping of the plane into 
itself such that T{Ei) = A and TiE 2 ) = B. Describe the image under T of 
the rectangle whose corners arc (0, 1), (3, 0), (0,0), and (3, 1). 

13. I>et A, B be two non-zero vectors in the plane such that there is no con¬ 
stant c 7^ 0 such that B — cA. Describe geometrically the set of points lA + 
uB for values of t and u such that 0 £ ( g 5 and 0 ^ u ^ 2. 

14. Let 5 be a set in R”. We say that S is convex if given two points P, Q 
in iS, the line segment joining 7* to Q is contained in 5. |The points on this line 
segment are those which can be written in the form /P + (1 — ()Q, 0 ^ f 2 1.1 
Let //iR" —» R" be a linear map. Show that the image under L of a convex set 
is convex. 

15. Dit L:R" —* R be a linear map. Let S be the set of all points A in R" 
such that L{A) S 0. Show that S is convex. 

16. Let L:R" —♦ R be a linear map, and let c be a number. Show that the 
set S consisting of all points A in R" such that L{A) > c is convex. 

17. Let Si, S 2 be two convex sets in R**. Show that the set of points S common 
to both Si and S 2 is convex. 

18. State the results of Exercises 14 through 17 for arbitrary vector spaces. 

19. Let A be a non-zero vector in R", and c a number. Show that the set of 
points X such that X • A ^ c is convex. 

20. I^t A, fi, C be three distinct points in R", .satisfying the condition that 
B ~ A and C — /I are linearly indejiendent. Show that this condition is 
equivalent with the fact that they do not lie on a straight line. 

21. Let A, B, C he three points in R" such that B — A and C — A are 
linearly independent. The set of points of type 

i\A-\- t2BA-hC 



LINEAR MAPPINGS 



[XI, §3] 


where /, are numbers, 0 S. li lor i — I, 2, 3 and + /2 + I 3 = 1 is called a 
triangle, determined by /I, B, C. 

(a) Show that a triangle is convex. 

(b) Show that any convex set containing *i, B, C also contains the triangle 
determined by A, B, C. 

(c) Let —* R" be a linear map such that ^’(.1), F(B), and F{C) 

are distinct and do not lie on a straight line. Show that the image 
under F of the triangle determined by -1, B, C is the triangle de¬ 
termined bj' F(.!), F(B), F{C). 

(d) Do Exercises 3 and 4 of Apiiendix 1. 

22. I./et r, ir be two vector spaces, and F:V W a, linear map. Let 
it'i, . . . , Vn be elements of IT wliich arc linearly indei>endent, and let t>i,. . . , Vn 
be elements of V such that F{v,) = u-, for i = 1, . . ., n. Show that t'l, ... , Vn 
are linearly independent. 

23. Let V be a \'eetor s|)ace and E: E —» R a linear map. Let IT be the subset 
of V eonsi^tin^ of all elements r .such that F(i’) = 0. Assume that W 9 ^ V, 
and let I’o be an cUunent of V which docs not lie in 11". Show that every element 
of r can be written as a sum u’ ci'o, with some u’ in U’ and some number c. 

24. In Exercise 23, show that IT is a sub.'jpace of E. Let {I’l, .. ., I’n} be a 
basis of ir. Show that {I'o. I'l, . . . , v„} is a ba.sis of E. 


The kernel of a linear map 

Let E, IE be vector spaces, and let F:V IE be a linear map. We 
eoutend tlmt flie following two conditions are equivalent: 

(1) If V is an element of E such that F(t') = 0, then v = (). 

'2 If ir are elements of 1’ such that F{r) = F{w), then c = ic. 

'I\> prove our contention, assvime first that F satisfies the first condi¬ 
tion, and supposj* t hat r. u-are sucli that F{v) = F(w). Then E(r — w) = 
/'(i) — Fiwi -- <). Hy assumption, v — ir = O. and hence c = w. 

Conversely, assume that F sati.'^fie.s the. .second condition. If r is such 
that /•’('■) = P'iO) = O, we conelude that c = 0. 

Let /'EE —♦ IE be as above. The set of elements v of E such that 
F{r) = () is called the kernel of F. We leave it as an exercise to prove 
that tlte kernel of F is a subspace of E. 

1 iiKoitHM 2. Let F —* JE In a linear map whose kernel is {0}. If 

t I.c„ are linearly imlepi 'ident elements of V, then F{vi), . . . , F(vn) 

arc linearly independent elements of IE. 

Proof. Let .ri, . . . , .r„ be numbers such that 






[XI, §3] 


THE KERNEL OF A LINEAR MAP 


139 


By linearity, we get 

F{XxVx H-h XnVn) = 0. 


Hence x-iVi + XnVn = 0. Since ^i,..., fn arc linearly independent 

it follows that x,- = 0 for t = 1,..., n. This proves our theorem. 

Example. Let A, B be two linearly independent vectors in n-spacc. 
Let P be a point in n-space. The set of all points 

X = P A-tA-{■ uB 

whore I, u range over all numbers is called a plane, passing through P, 
parallel to A and B. 

Let TiR" —» R”* be a linear mapping, and assume that the kernel of T 
is {0}. Then the image of a plane under T is again a plane. Indeed, con¬ 
sider a plane consisting of all points P + M + uB as above. Then th<‘ 
image of such a point is 


T{P) + IT{A) + uT{B), 

and since the kernel of T is {0}, T{A) and TiB) are linearly independent. 
Conse(|uentIy the image of our plane is a plane passing through T{P), 
parallel to T{A) and T{B). 

The equation X = P -|- M + uP is .sometimes called the parametric 
equation of a plane, 

Let P:F —♦ ir be a linear map. The image of F is the set of elements 
w in W such that there exists an element v of V such that P(e) = u>. 
The image of P is a .subspacc of W. To prove this, observe first that 
F{0) = 0, and hence 0 is in the image. Next, suppose that w^, u '2 are 
in the image. Then there exist elements vi, V 2 of V such that P(t’i) = uq 
and F{v 2 ) == u) 2 . Hence P{y, + (> 2 ) = P(('i) + P(i’ 2 ) = u^i + w-,, 
thereby proving that u>i + W 2 is in the image. If c is a number, tlien 
P(ct',) = cF(vi) = cu>,. Hence ctei is in the image. This proves that 
the image is a subspacc of W. 


Exercises 

1. Let .1, D be two vectors in forming a basis of R^. Lot P:R- -♦ R" 
be a linear map. Show that either F(/l), F(B) are linearly independei»t, or the 
image of F has dimension 1, or the image of F is {0}. 

2. Find a parametric equation for the plane in R'‘ passing through the three 
points (1, 1,0, -1), (2, -1, 1, 3), and (4, -2, 1, -1). 

3. Let F:V -* H’ be a linear map, who.se kernel is {0}. Assume that V and 
W have both the same dimen.sion n. Show that the image of F is all of W. 



140 


LINEAR MAPPINGS 


IXI, §4] 


4. Let F:V —» 11’ be a linear map and assume that the image of F is all of 
ir. Assume that V and IF have the same dimension n. Show that the kernel 
of Fis {0}. 

5. Let L:V —* IF be a linear map. Let w be an element of IF. Let yo be an 
element of V such that Lfeo) = Show that any solution of the equation 
L{X) — w is of type eo + where u is an element of the kernel of L. 

6. Let V be the vector space of functions which have derivatives of all orders, 
and let D-.V V be the derivative. What is the kernel of D? 

7. Let be the second derivative (i.e. the iteration of D taken twice). 
What is the kernel of In general, what is the kernel of D" (n-th derivative)? 

8. Let V be as in Exercise 6. We write the functions as functions of a variable 

t, and let D = d/dt. Let oi. a„ be numbers. Let g be an element of V. 

Describe how the problem of finding a solution of the differential equation 


d7 , 
dr 


d"-'/ 
dt 


m 


_ . . . _j_ — g 


can be interpreted as fitting the abstract situation described in Exercise 5. 

9. Let V, D be as in Exercise 6. Ijet L - D — I, where / is the identity 
mapping of V. What is the kernel of L? 

10. Same question of Z, = D — al, where a is a number. 


§ 4 . Kernel and image 

The main theorem relating the kernel and image of a linear map is the 
following. 


Thb^okem 3. Let 1 be a vector space. Let L:V —* W be a linear map 
of 1 into another space IF. Let n be the dimension of V, q the dimension 
of the kernel of L. and s the dimension of the image of L. Then n = 9 + s. 


Proof. If the image of L consists of 0 only, then our assertion is trivial. 
We may therefore assume that 5 > 0. Let {lei, . . . ,w,] be a basis of 
the image of L. Let f'l, . . . , r, be elements of V such that L{Vi) = Wi 
for i I, . . . , 5. If the kernel of L is not (0), let {wi, . . . , wj be a 
basis of the kernel. If tlie kernel is (0), it is understood that all reference 
. . , is to he omitted in what follows. We contend that 


to 


{“ 1 . 


[I'l, . . . , v«, Ui, . . . , is a basis of V. This will suffice to prove our 
as.scrtion. Let r be any element of ]’. Then there exist numbers Xj, . . ., x, 
.such tliat 

L{v) = xiWi + • • • + x,ie„ 


because [i/'i, . . . , u’*} i.s a basi.s of the image of L. By linearity, 

L{v) = LixiVi +-(- x,r,), 



IXI, §5] 


THE RANK OF A MATRIX 


141 


and again by linearity, subtracting the right-hand side from the left-hand 
side, it follows that 

L{v — xii’i — . -. - x,v,) = 0. 

Hence v — XiVi — ... — x,v, lies in the kernel of L, and there exist 
numbers j/i,..., such that 


V — Xil't — ... — jT.y, = t/,Ui + • ■ ■ + l/gUg. 

Hence 

V = xii't +-h J,*’, + j/i«i + •••-{- 

is a linear combination of t’j,.. ., Vg, Ui,.. ., Ug. This proves that these 
s + y elements of V generate V. 

We now show tliat they arc linearly independent, and hence that tliey 
constitute a basis. Suppose that there exists a linear relation 

iiDi + • • • + Xgi'g + i/iUi -f-h ygUg = 0. 

Applying L to this relation, and using the fact that L{uj) = 0 for j = 
1,. . ., y, we obtain 

xiL(vi) +-h XgLivg) = 0. 

But L{vi), . . ., L{vg) arc none other than u»i, . .. , te,, which have been 
assumed linearly independent. Hence x,- = 0 for t = 1, .. . , 5. Hence 

ViUi + ... + ygUg = 0 . 

But ui,. . . ,Ug constitute a basis of the kernel of L, and in particular, are 
linearly independent. Hence all y/ = 0 for j = 1,..., This concludes 
the proof of our assertion. 

§ 5 . The rank of a matrix 

Let A be an m X n matrix: 



Ixit A‘,. .., A” be its column vectors as usual, and let Ai.A,„ be 

its row vectors. Then the column vectors generate a subspace of m-space, 
and the row vectors generate a subspace of n-space. The column rank of 
the matrix is defined to be the maximum number of linearly independent 
columns among the column vectors of A. It is also equal to the dimon.sion 
of the Bubspace generated by these column vectors. Similarly, the row 



142 


LINEAR M.\PPINGS 


[XI, §5) 


rank of A is defined to be the maximum number of linearly independent 
rows among the row vectors of i4. It is also equal to the dimension of 
the subspace generated by these row vectors. (Cf. Exercise 12 of Chapter 
IX, §2.) The main theorem concerning the rank is the following. 

Theorem 4. The column rank of a matrix is equal to its row rank. 

Proof. Let r be the row rank, and let s be the column rank. According 
to Theorem 8 of Chapter X, §4, the dimension of the space of solutions of 
the system of homogeneous linear equations determined by A is equal to 
n — r. Indeed, it is the dimension of the space of vectors which are 
perpendicular to the space generated by the row vectors. 

We shall now use another interpretation for this space of solutions. 
Let ir be the vector space generated by the column vectors of A. Accord¬ 
ing to Theorem 1 of Chapter XI, §2, there exists a unique linear map 
L ;R" W such that for any vector X = {xi, .. . , x„) we have 

L{X) - x,A' -h ••• + XnA". 

Tlie image of L is therefore the space generated by the column vectors of 
A, and the dimension of this image is s. The kernel of L is by definition 
the space of solutions of the linear equations determined by A, namely it 
is the space consi.sting of those vectors .V such that 

.ri A' • • • + XnA" = 0. 

According to Theorem 3 of §4, the dimensioji of this kernel is n — s (we 
apply the statement to the case r = R";. Hence we now see that n — r = 
n — .s, or in other words, r = s, as was to be shown. 

The row ratik, or column rank, will simply be called the rank of the 
matrix. 

Example. What is the rank of the matrix 

/ 2 0 1 
\-l 1 5 

It is ♦ asy to check that the first two columns are linearly independent. 
ll(‘nc(' tile rank is 2. 

In oifier to compute easily the rank of a matrix, we observe that the 
following tiperations on the columns of a matrix do not change its rank. 

(1) .Multiplying one column by a non-zero number. 

(2) Interchanging two columns 

(3) .‘\<lding one column to another. 

[Prove that (1), (2), (3) do not change the rank as an exercise.) Further¬ 
more, since the row rank is etjual to the column rank, the same three 




IXI, §5) 


THE RANK OF A MATRIX 


m 


operations applied to rows instead of to columns also do not change the 
rank. By applying these operations to a given matrix, it is usually pos¬ 
sible to change the matrix into another one whose rank is more easily 
computable. 

In the proof of Theorem 4, the kernel of our linear map LiR” —* R*" 
consists precisely of the solutions X of the system of linear e(juations 

<* 11-^1 = 0 

9 

9 

“I” ‘ ’ "b Onin^'n ~ 0- 

Theouf:m 5. Let U be the space of solutions of (he preceding system 
of linear equations. Let r be the rank of the matrix {a,j). Then t/ie di¬ 
mension of U is equal to n ~ r. 

Proof. We know that U consists of all vectors perpendicular to the row 
vectors of the matrix (a,->). By Theorem 8 of Chapter X, §4 we know that 
dim U — n — r. 


Exercises 


1. Find the rank of the following matrices. 





2. lx‘t A, B he two matrice.s which can be multiplied. Show that rank of 
AB ^ rank of A, and also rank of AB 5 rank of B. (Do this exercise after 
reading the next chapter.) 

3. I>et 4 be a triangular matrix: 





and assume that none of the diagonal elements is equal to 0. What is the rank 
at A? 

4. Find the dimension of the space of solutions of the following systems of 
linear equations. 

(a) 2x -f- V — 2 = 0 (b) X — y + 2 « 0 

y-b 2 = 0 

(c) 4jr -b 7y — ir 2 = 0 
2x — y -b z = 0 


(d) X -b y 4- z = 0 
X ~ y = 0 
y+z = 0 


144 


LINEAR MAPPINGS 


[XI, §6] 


§ 6 . Orthogonal maps 

All the assertions of this section are very easy to prove, and furnish 
pleasant exercises. We shall not spoil your pleasure in working these out, 
and we shall only state the results, occasionally giving some hints for the 
proofs. 

Let V be a vector space with a scalar product, as in Chapter X, §4. 
If you wish, you may assume that V = R”, but this will not make any 
statement or proof easier to understand. Let F:V —* F be a linear map. 
This map may have additional properties, which we shall now describe. 
We shall say that F preserves length if we have 

IIFOOII = Ik'll 

for all V in V. Ob.serve that this is equivalent to saying that 


F{v) • F{v) — V ‘ V 


for all V in V' (i.c. l!/^(i’)||“ = l!i'l|' for all v in T). 
We shall say that F preserves the sealar product if 


F{v) ’ F{w) — V ' w 

for all elements e, w of V. 

Exercise 1. If F .satisfies any one of the above two properties, then it 
satisfies the other. 

Proof. Hint: To go from the first to the .second, use the hypothesis that 
\\F{v + = \\v 4- ta||^. To go from the second to the first, recall the 

argument of Chapter I which showed that “e ± u;” is equivalent with 
the relation “Iji' — lel] = jh’ + ujlj”. 

A linear map F which .satisfies any one of the above two properties, 
and hence .'^tisfies both, is called an orthogonal map. 

Exercise 2. Ia'I V be a vector space with a scalar product. Let 

{t’l.i'„) and {a’l. . .., ie„} be two orthonormal bases. Let F he & 

linear map of V into itself such that 

F(vi) = le, 

for ; = 1, . • . , n. Prove that F is orthogonal. 

Exercise 3. Let F be a vector space with a scalar product. Let 
{f’l, .... f'n} be an orthonormal basis of V. Let F be a linear map of V 
into it.self which is orthogonal. Show that F(j'i),... , F(i'n) is also an 
orthonormal basis. 


(XI. §61 


ORTHOGONAL MAPS 


145 


Exercise 4. Let 7 be a vector space of dimension 2, with a scalar product, 
and let F be an orthogonal linear map of V into itself. Let (I'l, i’ 2 } and 
{wi,W 2 } be two orthonormal bases of T such that F(i’j) = u’i for f =1,2. 
Let a, b, c, d be numbers such that 

Wi = avi + bv2, W2 = Ci’i + di'2. 

Show that + b^ = 1, = 1, ac + bd = 0, = d^, and 

c" = b\ 

Exercise 5. Let F and V be as in Exercise 4. Assume that ad ~ he > 0. 
Show that there is a number 6 such that 

= (cos + (sin 
= (—sind)ui + (cos 0 )t' 2 . 

(Referring to the next chapter, you will see that this means that is a 
rotation. Conversely, when you have read the section on rotations in 
the next chapter, deduce that a rotation is an orthogonal map.) 

Note. If ad — be < 0, then the orthogonal map F does rmt correspond 
to a rotation. Give an example of such a map, and interpret it geomet¬ 
rically. 


CHAPTER XII 


Linear Maps and Matrices 

When bases have been selected, it is possible to represent a linear map 
by a matrix, and conversely, every matrix gives rise to a linear map. 
We shall define addition and multiplication of linear maps and matrices, 
and see that these correspond to each other. 

§i. The linear map associated with a matrix 

Let K be a vector space, and let (wi,..., Vn} be a basis of V. Let W 
be a vector space, and let {i^i, ... , be a basis of W. Let 

( Oil ai2 • • • am 
^rn\ ^m2 * * ^mn 

be an m X n matrix. We shall define a linear map 

depending on A, and the choice of bases for V and W. 

Let ./1 1 , . . ., be the row vectors of the matrix A. Thus; 

= (an, • • •, am) 

Am — (ami, • • • , amn). 

Let V be an element of V, and let X = (xj, . . . , x„) be its coordinates 
with respect to the given basis t*i, . .. , i'„. We associate with v the ele¬ 
ment La{v) of W given by the equation 

L.i(y) = (/li • X)wi + * • • + (/Im • X)w,^. 

In other words, we may say that the t-th coordinate of La{v) is Ai ■ X. 
Thus La is a mapping from V into ir. 

Theorem 1. The mapping La is a linear mapping. 

Proof. Let u, v be elements of V. There exist unique numbers yj, 

■ . . , Un and Xi,. . . , such that we can write 

u = yil’l + - • • + l/ni-n, V = XiVi + * • • + Xr^l'n- 



146 


[XII, §1] THE LINEAR MAP ASSOCIATED WITH A MATRIX 


147 


Then 
We have: 


u V = (Ji + H-1- (j-« + yfi)i'n- 


+ y) = ((ill • (-^ + + • • ■ + ((-Im • + y})tOm- 

In Other words, the t-th coordinate of + y) is /I, • (X + Y). We 
know that /li ■ (X + T) = ■ X + /I, ■ T. Hence 

La{u + y) = {/I I • X)iyi + • • ■ + (il^ • X)Wn 

+ (/li * K)tyi + • • • -f ■ Y)Wn 
= La{u) + La{v). 


Let c be a number. Tlien 


Lyi(cu) = (A 1 • cX)u>i + • • ■ + (/!„ • cX)u’„, 
= c(/l| ■ X)iyi + • • • + c(/l„ • X)Wn 
= cLA(n). 


This proves our theorem. 

Example 1. Let F — R" and IF = R”. We let 



(eacli vector having ti components). 


We also let 






(each vector having m components). 


Any vector in R" can then be written in the form 

(jTl, . . . , X„) = Xi/l'i + • • ■ + XnE„, 

and any vector in R"* can be written in the form 

(Ul, ■ • ■ j Urn) = VlEi + ■ • • + UmEm- 

If A is a matrix as before, and La(X) = V, then we sec that 


Ul — + • • ’ Qjf.J'n 

Vm — amJ-Tl + ■ • • 4- Omn-Tn- 


148 


LINEAR MAPS AND MATRICES 


[XII, §1] 


It is convenient at this point to define the multiplication of a matrix 
by a vector. Let A = (oo) be an m X n matrix, and let X be a column 
vector, with precisely n components. 


X = 



Then we define AX to be the column vector 

Y = 

whose coordinates i/i,.. • , i/m are given by 2 /. = A,- • X. We see that 
the multiplication is obtained by taking the dot product of the rows of 
A with the column X: 




Example 2. Let V 


R3 and W = Let 



Let 


be a column vector. Then 

AX 



/2xi -f- X 2 — 3-r3\ _ 

\xi + 2 j2 + 4X3/ 




then 



(XII, |1] THE LINEAR MAP ASSOCIATED WITH A MATRIX 


149 


Let A and be m X n matrices. It is a natural question to ask when 
they give rise to the same linear map, i.e. when La = Lb. The next 
theorem answers this question. 

Theorem 2. Let A, B be m X n matrices. Let V, W be two vector 
spaces as above, and suppose that bases have been selected as above. Jf 
La = Lb, then A = B. In other words, if the matrices give rise to the 
same linear map, then they are equal. 

Proof. Let (yi,. .. , i’„} and {uji, . . ., u>„} be the bases of V and W 
respectively. Assume that La = Lb. Let Ai,..., be the row vectors 
of A, and let Bi,... be the row vectors of B. For any n-tuple 
X = ( 3 : 1 ,. .., Xn), and v — xiVi + • • • + ini'n, the expressions of 
La{v) and Lb{v) as linear combinations of w\, . . . ,w„ arc equal. Hence 
their coordinates with respect to the basis {u>i,..., Wm) are equal. Hence 

Ai-X = Bi . X 

for all i = 1,.. ., m. Hence (A,- — B,) ■ X = 0 for all i and all X. Hence 
A,- — = 0, and A^ = B; for all i. Hence A = B. 

It will be a good exercise to prove the next theorem. 

Theorem 3. Let V, W be vector spaces. Let {vi, . .., i'„) be a basis 
for V, and let {tei,. .. , uj„} be a basis for \V. Let A, B be two m X n 
matrices, and let La, Lb be the associated linear maps from V into W, 
relative to these bases. Then for any element v of V, and any number c, we 
have: 

La+b(v) = La{v) + Lfl(y) 

LcAiv) = cLAiv). 

We could omit the v in the statement of Theorem 3, and simply write 

La+b = La A- Lb, 

Lca — cLa. 

In this manner, we sec that the rule L, which to each m X n matrix A 
associates the linear map La, is itself a linear map, from the vector space 
of m X n matrices into the vector space of linear maps from R” into R"*. 

Let us denote the vector space of m X n matrices by and let us 

denote by the vector space of linear maps from R" intoR”*. Then 
the assertion of Theorem 3 is equivalent with the assertion that 

L'SnXm.n * £m,n 


is itself a linear map. 


150 


LINEAR MAPS AND MATRICES 


IXII, §21 


Exercises 


1. In each case, find the vector La^X). 

w = (i o )-^" C 9’^ 

'-=>'‘ = G 1)-^ = ^“’'^ w^ = (o = 

2. Let X be the indicated column vector, and A the indicated matrix. Find 
AX as a column vector. 




(d) X = 


3. Let 





Find AX for each of the following values of X. 

<.>.-(•} 

4. Let 


7 5 



-1 4 • 



Find .-lA' for each of the values of -Y given in Exercise 3. 

5. Let 

X = 

What is .lA'? 

G. Let A’ be a column vector having all its components equal to 0 except the 
i-th component which is equal to 1. Let .1 be an arbitrary matrix, whose size 
is such that we can form the product /l.Y. ^^'hat is -lA*? 



§2. The matrix associated with a linear map 

Let V, ir be vector spaces. Let IF be a linear map of V into 

W. Wc shall sec liow we can associate a matrix with F. Such a matrix 
will depend on a choice of bases for V, IF. 


(XII, §2] THE MATRIX ASSOCIATED WITH A LIXEAR MAP 


151 


Let {yj,. .., t'„} be a basis of V and let {u>i,. .., 10 ^} be a basis of 
W. Each one of F(vi), ..., F(i’„) is an element of W. Hence each one 
can be written as a linear combination of u>i,,.., w^. Thus: 

F{vi) = OiiUJi 4*-h a„iWr„ 


The array 


FM = OuUJi +-(- flmnU'r, 


( an 021 
ai2 022 

ajn 0271 



is a matrix. The transpose of this matrix will be called the ynatrix asso¬ 
ciated with the mapping F (relative to our choice of bases). 

The transpose of the above matrix is therefore the matrix 

( On 012 ■ • • a,n\ 

■ 

^m2 * * ^mn/ 


The reason for taking the transpose will become clear in a moment. 

Let V = x\V\ + • • • + Xnt'n hc an element of V. Since F is linear, we 
obtain 

F(V) = XiF{Vi) + • • • + XuF{Vn). 


Using the expression for F(vi ),.... F(i-„) given above in terms of u>i, . . . , 
w„, we find that 

F(v) = xifanUJi +-h -\ -h x„(ai„w)i H-h a^n^n,), 

and after collecting the coefficients of ujj, ..., ly,,, we can rewrite this 
expression in the form 

-f • • • + ai„Xi)iej + •••-+- (OmiZi + • • • -h a„nXn)n}m. 

This is precisely equal to Hence F = L^l 

In other words, let e be an clement of V. Let X be its (vertical) co¬ 
ordinate vector relative to {I'l, . .. , Let A be the matrix associated 
to F relative to tlic chosen bases. Then the coordinate vector of F{v) 
relative to the basis {lu,, .. . , uj„} is AX. 

In view of Theorem 3 of the preceding section, we sec that our matrix 
A is the unique matrix such that F = La- One could also prove the 
uniqueness of such a matrix /I by a direct argument, using the fact that 
the values of F are determined by its values on basis elements. 


152 


LINEAR MAPS AND BIATRICES 


(XII, §2] 


Example 1. Let F:R^ —»be the projection, in other words the 
mapping such that F{xi, xz, X 3 ) = (ii, Xz). Then the matrix associated 
with F relative to the usual bases is 



Example 2. Let F:R" R” be the identity. Then the matrix asso¬ 
ciated with F relative to the usual bases is the matrix 

1 0 0 
0 1 0 

0 0 0 

having components equal to 1 on the diagonal, and 0 otherwise. 

This matrix is called the unit matrix (or unit n X n matrix), and is 
sometimes denoted by 7„ or I if the dimension is not specified.. 

Let ffi = {vi.Vn} be a basis for the vector space V, and let ©' = 

{uJi,..., w„} be a basis for the vector space W. We shall denote by 

A/®(F) 

the matrix associated with a linear map F of F into W, relative to the 
bases © and ©^ If these bases are fixed throughout a discussion, then we 
may write simply M{F). 

Warning. Assume that V = W, but that we work with two bases © 
and ©' of V which are distinct. Then the matrix associated with the iden¬ 
tity mapping of V into itself relative to these two distinct bases will not 
be the unit matrix! 

Example 3. Rotations. We shall encounter two situations. First, we 
shall pick two different coordinate systems differing by a rotation. The 
identity mapping will then have an associated matrix which is not the 
unit matrix. Second, we shall discuss the matrix associated with a rota¬ 
tion, with respect to a fixed basis. 

Case 1. Let us start with our coordir c te 
system in the plane as usual. Let Ei — 

(1,0) and Ez = (0, 1) be the unit vectors. 

We consider another coordinate system ob¬ 
tained by rotating the given coordinate 
system counterclockwise by an angle 6 . 

Then the unit vectors are moved into two 
new unit vectors E'l and E'z- 





[XII, 52] THE MATRIX ASSOCIATED WITH A LINEAR MAP 


153 


From the picture, we see that 

E'i = (cos B)Ei + (sin B)E 2 , 

E 2 = (—sin B)Ei + (cos B)E 2 . 

If we multiply the first equation by cos B, if we multiply the second by 
—sin B, and add, we find: 

El = (cos 6 )Ei — (sin 6 )E 2 - 

Similarly, 

E 2 = (sin 0)E\ + (cos 6 )E 2 . 

Let be the identity mapping. Let (B = {Ei,E 2 } and 

= {E[,E^}. Then; 

Id(Ei) = El = (cos0)E\ + (-sine)£'2, 

Id{E 2 ) = £2 = (sin B)E'i + (cos 0)E2. 

Consequently, the matrix associated with the identity mapping relative 
to the bases ® and is 

cos 6 sin 0 
—sin 6 cos B 

Case 2. Let us keep our standard coordinate system, with basis ® = 
{El , E 2 }. Let F: R* —* R^ be the mapping obtained by rotating the plane 
through an angle 6 (counterclockwise). Then: 

FiEi) = E\ = (cos0)£:, + (sin0)F2» 

F{E 2 ) = E 2 = (—sin 0 )Ei + (cos d)E 2 . 

Hence the matrix associated with F relative to the bases ®, ® is tlie 
transpose of the matrix in Case I, namely: 

( cos 0 —sin 0 
sin 0 cos 0 

There is no avoiding the fact that the matrix in Case 1 turns out to be 
the transpose of the matrix in Case 2. Hence it is necessary always to be 
careful of the selection of bases to compute the matrix associated with a 
linear map. 

The next theorem is the analogue of Theorem 3, §2. 

Theorem 4. Lei V, W be vector spaces. Lei <&be a basis of V, and ®' 
a basis of W. Letf, g be two linear maps of V into W. Then M{f g) = 

U c t8 a number, then M{cf) = cM{j). {The associated 
matrix is taken relative to the given bases ffl and ®'.) 

The proof will be left as an exercise. 






154 


LINEAR MAPS AND BIATBICES 


[XII, §3] 


Exercises 


1. Assume that R", R" have their usual bases. Find the matrix associated 
with the following linear maps. 

(a) F-.K* -V R2 given by Ffii, X2.13, X4) = (xi, X2) (the projection). 

(b) The projection from R^ to R®. 

(c) F:R^ —► R^ given by F(x, y) - (3x, Sy). 

(d) F.R’^ -* R" given by F{X) = 7X. 

(e) F;R" R" given by F{X) = —A'. 

(f) F:R* —♦ R"* given by F{xi, xs. X 3 , X 4 ) = (xi, X 2 , 0, 0). 

2. Let (B = {El, E 2 } be the usual basis of R-, and let (B' be the basis obtained 
after rotating the coordinate system by an angle 6. Find the matrix associated 
to the identity relative to (B, (R' for each of the following values of 6. 

(a) t/ 2 (b) jr/4 (c) T (d) — JT (e) — ir/3 

(f) t/ 6 (g) 57r/4 

3. In general, let > 0. What is the matrix associated with the identity 
map, and rotation of bases by an angle —6 (i.e. clockwise rotation by 6)? 

4. Let A’ = (I, 2) be a point of the plane. Let F be the rotation through an 
angle of 7r/4. What are the coordinates of F(X) relative to the usual basis 

{El. E2]-> 

b. Same question when X = (—1, 3), and F is the rotation through ir/2. 

6. In general, let F be the rotation through an angle 6. Let (x, y) be a point 
of the plane in the standard coordinate system. Let (x', y') be the coordinates 
of this iKiint in the rotated system. Express i', y' in terms of x, y, and 0. 

7. Let A' = (x, y) and let F be a rotation through an angle 0. Show that 
\\X\\ = |!/''(.V)'l (i.e. that/•’preserves norms). 

5. In each of the following eases, let D = d/dl be the derivative. We give a 
set of linearly independent functions <B. These generate a vector space V, and 
/> is a linear map from V into itself. Find the matrix associated with D relative 
to the buse.s (B, <B. 

(a) (b) {!.(} (e) {e'.te') 

(d) {1,/, t-; (e) { 1 ,/. c‘, f (f) {sin L cos f) 


9. Let 011 be the vector space of m X n matrices. What is its dimension? 

10. Let V, U’ bo vector spaces of dimensions n, m, respectively. What is the 
dimension of the vector s[)ace of linear maps from V into II'? 

11. Let V be a vector space of dimen.sion n. What is the dimension of the 
space of linear ma[)s from V into R? 


§5, Composition of linear mappings 

Let U, V, W be sets. Let F:U —>■ V be a mapping, and let G: F W 
be a mapping. Thou wc can form a composite mapping from U into IF 
in the same way that we formed composite functions. The value of this 




(XII. §3] 


COMPOSITION OF LINEAR MAPPINGS 


155 


composite mapping at an element u of f/ is G(f(u)). The composite 
mapping is defined by the rule: Associate with the element u of U the cle¬ 
ment G{F{u)). This composite mapping will be denoted by G ^ F. 

Example 1. If /:R —♦ R is a function and (;:R —» R is also a function, 
then g o f is the composite function, as studied long ago. 

Example 2. Let F:R R^ be the mapping given by 

F(t) = 

and let G:R^—>R be the mapping given by Gix,y) = xy. Then 
(?(F{0) = = t\ Thus {G - F)(0 = l^. 

Example 3. Let XiR -» R^ be the mapping given by 

X(0 = {t, e\ 2 sin 0- 

Let F.R^ -♦ R be the mapping (function) given by 

F(x, y, z) ~ x^y + z. 

Then F(X(o) = + 2 sin 1. We can also write (his (F = A')(0. 

Example 4. Let U be the vector space of differentiable function.s (of 
one variable 0, and let U = V = W. I^t F:(’ -» LI be the mapping 
which to each function / associates its scpiare [i.e. /*(/) = /"), and let 
D:U —* U he the derivative. Then for any differentiable function/, we 
have 

(/> ^ /’)(/) = 2//'. 

denoting by/' the derivative of/. 

Example 5. lA?t L^ = V = W be tlie vector .space of functions having 
derivatives of all orders. I>et 1) be the derivative. Tliei» 

(/> • 0)U) -/" 

is the second derivative. Also, (I) <■ I) » D){j) = /'" = /*'*’ is the third 
derivative. In general, we could write /;"/ = /‘"’. Thus D” is the itera¬ 
tion of I) taken n times. 

Theouem 5. Let U, V, W be vector spaccH. Let 

F:U-^V and G :VW 

be linear mappings. Then the composite mapping G • F is also a linear 
mapping. 


156 


LINEAR MAPS AND MATRICES 


[XII, §3] 


Proof. This is very easy to prove. Let u, v be elements of U. Since F 
is linear, we have F{u + y) = F(u) + F(y). Hence 

(G - F){u 4- y) = G{F{u + v)) = G{F{u) + F{v)). 

Since G is linear, we obtain 

G{F{n) + F(v)) = G(F(u)) + G(F(v)). 

H6nc6 

(G = F){u + y) = (G . F){u) + (G » F){v). 

Next, let c be a number. Then 

(G » F){cu) = G{Ficu)) 

~ G{cF(u)) (because F is linear) 

= cG(F(u)) (because G is linear). 

This proves that G « F is a linear mapping. 

Tlie next theorem states that some of the rules of arithmetic concern¬ 
ing the product and sum of numbers also apply to the composition and 
sum of linear mappings. 

Theorem G. Let U, V, W be vector spaces. Let 

F:U -> V 

be a linear mapping, and let G, // be two linear mappings of V into W. 
Then 

(G + //) . F = G . F -h // 0 F. 


If c is a nuynber, then 

(cG) . F = c(G F). 

If T:U —♦ V is a linear mapping from U into V, then 

(7 „ (F + T) = G = F + G - T. 

The proofs are all simple. We shall just prove the first assertion and 
leave the others as exercises. 

Let u be an element of [/. \Vc have: 

((G -j- H) . F)(u) = (G 4- //)(F(«)) = G(F(u)) + H{F{u)) 

= (G - F)(«) -h {H . F){u). 

By definition, it follows that (G 4 //) ** F = G <> F -f- // ■> F. 

It may happen that U = V = ir. Let F:U U and G:U —+ G be 
two linear mappings. Then we may form F o G and G » F. It is not 



IXII, §4] 


MULTIPLICATION OF MATRICES 


157 


always true that these two composite mappings are equal. As an example, 
let U = R^. Let F be the linear mapping given by 

F{x, y, z) = {x, y, 0) 

and let G be the linear mapping given by 

G(x,y,z) = (x, 2,0). 

Then (G o F)(x, y, z) = {x, 0, 0), but {F <■ C)(x, y, z) = (x, 2 , 0). 

The following theorem applies to all mappings. 

Theorem 7. Let U, V, W, S he sets. Let 

F:U-^V, G:V-*W, and H:W-^S 

be mappings. Then 

H > (G - F) = (H - G) ^ F. 

Proof. Here again, the proof is very simple. By definition, we have, for 
any element u of G: 

(H .(G. F)){u) = H((G . FHu)) = H{G^F(u))). 

On the other hand, 

{{H . G) . F){u) - (// . G){Fiu)) = H{GiF(u))). 

By definition, this means that {H • G) > F ~ H • {G « F). 

%4, Multiplication of matrices 

We shall now define the product of matrices. Let A = (a,), i = 

1.. .., m and j = 1,. . . ,n be an m X n matrix. Let B = (6;*), j — 

1.. . ., n and k = 1,... , » be an n X s matrix. 



We define the product AB to be the m X s matrix whose ifc-coordinate is 

n 

^ 0 , 252 * + • • • + ainbnk- 

i-1 

If Ai, ..., are the row vectors of the matrix A, and if . B* 

are the column vectors of the matrix B, then the ifc-coordinate of the 



158 


LINEAR MAPS AND MATRICES 


[XII. 14] 


product A Bis therefore equal to Ai- B‘‘. Thus 

*41 • jB^ • • • Ai ‘ B' 


AB = 


4 • B^ 


Ar. • B\ 


Multiplication of matrices is therefore a generalization of the dot product 


pJxarnple 1. Let 




Then AB is a 2 X 2 matrix, and computations show that 




Compute (.l/l)C. U hiit do you find? 

lA‘t .4 he an m X n matrix and let B be an » X 1 matrix, i.c. a column 
vector. Then AB again a column vector. 

Let .1 be a 1 X a matrix, i.e. a row vector, and l(‘t B be an n X s matrix. 


Tlien AB is a row vector. 

If .4 is a scpiare matrix, then we can form the product .1.4, which will 
be a square matrix of the same size as .4. It is denoted by .4". Similarly, 
we can form A'-\ .!■*, and in general, .4" for any po.sitivc integer n. We 
define .4® = / (the unit matrix of the same size as .4). 


Theorem 8. Lc/ .4, B, C be maln'ces. Assume that A, B can be multi¬ 
plied, and A,C can be multiplied, and B, C can be added. Then A, B + C 
can be multiplied, and we hai-e 


A{B -^C) = AB+ AC. 



IXII, §4) 


159 


MULTIPLICATION OF MATRICES 


If X is a number, then 

A(xB) = 

Proof. IjCt Ai be tlic t-tli row of A, and let B‘‘, C‘‘ be (he k-lh column 
of B and C respectively. Then B^ + C* is (lie A-tli column of B C. 
By definition, the tl-component of AB is the i‘A:-component of 

AC is Ai ■ C*, and the lA-component of AiB -h C) is .4, ■ (B‘‘ + ('*■'). Since 

+ (,’*=) = d.-B* + Ai-C\ 

our first assertion follows. As for the second, observe that the k-th column 
of xB is Since 

our second a.ssortion follows. 

Theorem 9. Let A, B,C be mainccs such that A, B can be multiplied 
and B, C can be multiplied. Then A, BC can be multiplied, so can AB, C, 
and we have 

{AB)C = A{BC). 

Proof. Ix't .4 = (a„) be an m X n matrix, let B — (bjk) be an n X r 
matrix, and let C = (cki) be an r X 5 matrix. The product AB is an 
m X r matrix, whose lA-component is eijual to the sum 

A- a,2b2k f • ■ • f- a„J)nk. 

Wc shall abbreviate this sum u.'^inj? our notation by writing 

n 


By definition, the iV-componont of {AB)C is equal to 

r n ^ ^ 

ik_l >-1 J * = 1 Lj = I 

The sum on the right can also be described us the sum of all terms 

ai)bjkf'kti 

whore j, k range over all integers I % j ^ n and 1 ^ A ^ r re.spectively. 

If we had started with the ^/-compf)nent of BC and then computed the 
t/-component of A{BC) we would have found exactly the same sum, 
tliereby proving the theorem. 

Wc have one final result relating linear maps and matrices. 



160 LINEAR MAPS AND MATRICES [XII, §4] 

% 

Theorem 10. Let V, TV, U be vector spaces. Let (B, (&', ffi" be bases 
for V, W, U respectively. Lei 

F:V-*W and G:W ^ U 

be linear maps. Then 

M%,{G)M%{F) = M®.,(G o F). 


{Note. Relative to our choice of bases, the theorem expresses the fact 
that composition of mappings corresponds to multiplication of matrices.) 

Proof. We shall see that Theorem 10 follows from Theorem 9. Let A 
be the matrix associated with F relative to the bases (B, ffi' and let B be 
the matrix associated with G relative to the bases ffi', ffi". Let v be an 
element of V and let X be its (column) coordinate vector relative to ffi. 
Then the coordinate vector of F{v) relative to ffi' is AX. By definition, the 
coordinate vector of G{F(v)) relative to ffi' is B{AX), which, by Theorem 
9, is equal to (R/l)X. But G{F(v)) = {G - F)(v). Hence the coordinate 
vector of {G « F){v) relative to the basis ffi" is (BA)X. By definition, 
this means that BA is the matrix associated with G » F, and proves our 
theorem. 

Re 7 nark. In many applications, one deals with linear maps of a vector 
space V' into itself. If a basis ffi of V' is selected, and F:V —*■ F is a linear 
map, then the matrix 

is usually called the matrix associated with F relative to ffi (instead of 
saying relative to ffi, ffi). Cf. Exercise 13 to see how this matrix changes 
when the basis ffi is changed. 


Exercises 

1. Lot / be the unit n X n matrix. Let .<4 be an n X r matrix. \\Tiat is /A? 
If A is an n X n matrix, what is A/? 

2. Let 0 be the matrix all of whose coordinates are 0. Let A be a matiix 
of a size such that the product AO is defined. What is AO? 

3. In each one of the following cases, find {AB)C and A{BC). 


w-' = (3= ( J 3 ) 




IXII, §41 


MULTIPLICATION OF MATRICES 


161 



4. Let 5 be square matrices of the same size, and assume that i4B = B.\. 

Show that {A + = >1^ + 2/lB + B^, and 

{A + B)iA - B) = A^ - B\ 

using the properties of matrices stated in Theorem 8. 

5. Let 

Find AB and BA. 

6. Let 5 be as in Exercise 5. Find CA, AC, CB, and 

BC. State the general rule including this exercise as a special case. 

7. Let A' = (1, 0, 0) and let 

/3 1 5\ 

. 1 = 2 0 1 • 

\l 1 7/ 

What is Ad? 

8. Let A = (0, 1, 0), and let A be an arbitrary 3X3 matrix. How would 
you describe A.l? What if A' = (0,0, 1)? Generalize to similar statements 
concerning n X n matrices, and their products with unit vectors. 

9. Let A, B be the matrices of Exercise 1(a). Verify by computation that 
*(/lB) = 'B'.l. Do the same for 1(b) and 1(c). Prove the same rule for any 
two matrices A, B (which can be multiplied). If A, B, C are matrices which can 
be multiplied, show that ‘(ABC) = 'C‘B‘.4. 

10. Let be an n X n matrix such that ‘M = M. Given two row vectors 
in n-space, say A and B define (A, B) to be AM‘B. (Identify a 1 X 1 matrix 
with a number.) Show that the conditions of a scalar product are satisfied, 
except pos.'iibly the condition concerning positivity. Give an example of a 
matrix M and vectors A, B such that AM‘B is negative (taking n = 2). 

11. Let A be the matrix 

/O 1 1\ 

0 0 1 )> 

\0 0 0 / 

Find A^, A^. Generalize to 4 X 4 matrices. 

12. Take V = W = U in Theorem 10. Let P and G be both equal to the 
identity mapping. Let (B, (B' be bases of V. Show that 

(Id)M%'iId) => I. 

where I is the unit matrix. 



162 


LINEAR MAPS AND MATRICES 


[XII, §5] 


13. Let F be a vector space, and (B, <&' two bases. Let -► F be a linear 
map of F into itself. Let M be the matrix associated with F relative to the 
bases (B, (B, and let .1/' be the matrix associated with F relative to the bases 
(B', (B'. Show that there exist matrices A and B such that M' = AMB, and 
.'IZ? = 5.1 = I. (One usually writes 5 = -1“*.) 

14. Let .1 = (fli;), i = 1, .... m and j = 1, .. ., n be an m X n matrix. 
Let 5 = (bjk), j = 1...., n and k = I,..., 6 be an n X s matrix. Let 
.15 = C. Show that the A-th column C* can be written 

* -f • • • + bnivl". 

(This will be useful in finding the determinant of a product.) 

§5. Applications to linear equations 

We can give one more interpretation to linear equations, using the 
notions of linear map and multiplication of matrices. 

Let .4 be an m X n matrix, and let La be the linear map represented 
by the matrix .4 (relative to the u.sual bases of R" and R”*). Let -Y be a 
column vector in n-space. Then 

La{X) = AX 

is ecjual to the product of the matrix A timo.s the vector A''. Given a 
column vector B in m-space, we can say tliat the set of solutions of the 
inhomogeneous system 


“r''’4‘ain-rn — 

Umt-Tl 'h' ■ ’ ' 4“ Otnn-Tn “ bni 

consi.sts of all solutions A' of the equation L. 4 (A') = B, or in terms of 
matrices, 

AX = B. 

If 5 is tlie 0-vector, then A' is the solution of the homogeneous system 
AX = 0, or La{X) = 0. In that case, we can say that the set of solu¬ 
tions of the homogeneous system is the kernel of La- In this manner, we 
see once more that it is a vector space. 

If C is any solution of the inhomogeneous system AX = B, i.e. if 
AC = B, and Y is any solution of the homogeneous system, then C + F 
is a solution of AX = B. Conversely, if C, C' are two solutions of the 
equation AX = B, then there exists a vector Y such that C' = C + Y. 
Prove these assertions as an exercise (using the present interpretation of 
the system of linear equations). It may happen of course that the in- 



(XII, §5] 


APPLICATIONS TO LINEAR EQUATIONS 


163 


homogeneous system does not have a solution, i.e. the equations may be 
inconsistent. For instance: 

2x + 3i/ = 1. 

2x + 3 j/ = 0 

does not have a solution, even though the homogeneous system has a 
1-dimensional space of solutions. 

Exercise 

1. Let A be an m X n matrix and B a column vector in m*space. If the 
system AX — B has a solution, the dimension of the associated homogeneous 
system AY = 0 is called the dimension of the space of solutions. Find this 
dimension for the following systems of equations. 

(a) 2i + 3j/ — z = 1 (b) 2x — y + z = 0 

+ y + z = 5 

(c) —x + 4y + z = 2 (d) z — y + z = 1 

3i+y—z = 0 2z — 3y-f-z = 0 

z + y — z = 5 



CHAPTER XIII 


Applications to Functions of 
Several Variables 

Having acquired the language of linear maps and matrices, we shall be 
able to define the derivative of a mapping, or rather, of a diflferentiable 
mapping. The theoretical considerations involved in the proof of the 
general chain rule of §3 become of course a little abstract. But you should 
note that it is precisely the availability of the notion of linear mapping 
which allows us to give a statement of the chain rule, and a proof, which 
runs exactly parallel to the proof for functions of one variable, as given in 
the First Course. The analysis profits from algebra, and conversely, the 
algebra of linear mappings finds a neat application which enhances its 
attractiveness. 

§/. The derivative as a linear map 

We shall interpret our notion of differentiability given in Chapter III 
in terms of linear mappings. 

lyot be an open set in R'‘. Let / be a function defined on U. Let P 
he a point of U, and assume that / is differentiable at P. Then there is a 
vector /I, and a function g such that for all small vectors H we can write 

(1) /(P -f- //) = /(P) + .1 • // + \\HMH), 
and 

(2) lim g(I{) = 0. 

The vector A, expressed in terms of coordinates, is none other than the 
vector of partial derivatives: 

A = grad/(P) - (Z)./(P),..., DnfiP)). 

We have seen in Example 3 of Chapter IX, §3 that there is a linear 
map L such that 

L{H) = A- H. 

Our condition that/is differentiable may therefore be expressed by saying 
that there is a linear map LiR” —♦ R and a function g defined for suffi- 

164 


(XIII, §1] 


THE DERIVATIVE AS A LINEAR MAP 


105 


ciently small H, such that 

(3) f(p + //) = m + /.(//) + mw) 

and 

lim g{H) = 0. 

Up to now, wc did not define the notion of derivative for functions of 
several variables. We now define the derivative of/at P to l>e this linear 
map, which we .shall denote by I)f{P) or also/'(/^). Tlil.-^ notation i> there¬ 
fore entirely similar to the notation used for functions of one \arial)le. 
We could not make the definition before we knew what a linear map is. 
All the theory developed in Chapters II throujjh VII could he carricil out 
knowing only dot products, and this is the reason we postponed making 
the general definition of derivative until now. 

If L is a linear map from one vector .space into another, then it will he 
useful to omit some parentheses in order to simplify the nutation. Thus 
wc shall sometimes write Pv in.'^tead of Liv). With tliis convention, we 
can write (3) in the form 

(4) f(P T II) = m + I)f(P)II r \Il 'oilh. 
or also 

(o) f{i> + //) = /(/') + /'(/')// -I WnWfliii). 

The.se ways of expressing difTerentiahility are those whicli generalize 
to arbitrary mappings. 

Ia*! U be an open set in K”. I..et I'.C —» It"' he a mapping. I.et P ho a 
point of U. We shall .say that F is diffcrcntiahlc at P if there cxi.sts a 
linear map 

c it" ir 

and a mapping G defined for all vectors II sufficiently small, .such that we 
hav'c 

(0) FiP -h //) = /'•(/') -r LII i- :H\]a{II) 

and 

(7) lim G{II) = 0. 

If such a linear mapping L exi.sts, then we interpret (li) as .‘^aying tliat 
P approximate.s F up to an error term whose magnitude is .small, near the 
point P. 

A linear map P satisfying conditions (0) and (7) will he .said to he 
tangent to F at P. 



166 


FUNCTIONS OF SEVERAL VARIABLES 


IXIII, §1] 


Theorem 1. Suppose that there exist linear maps L, M which are tan¬ 
gent to F at P. Then L = M. In other words, if there exists one linear 
map which is tangent to F at P, then there is only one. 

Proof. Suppose that there are two mappings Gi, G 2 such that for all 
sufficiently small H, we have 

F{P + H) = FiP) + L// + 

F{P + //) = F(P) + MH + |!ff|I(?2W 
and 

lim Gi{H) = 0, lim G 2 iH) ^ 0. 

IIWII-O IIHII-O 

We must show that for any vector Y we have LY — MY. Let i range 
over small positive numbers. Then tY is small, and P + iT lies in U. 
Thus F{P + tY) is defined. By hypothesis, we have 

F(P + tY) = FiP) + LitY) + WtYWGiitY), 

FiP + tY) = FiP) + MitY) + WtYWGiitY). 

Subtracting, we obtain 

0 = LitY) - MitY) + \\tY\\[GiitY) - G2itY)]. 

« 

I^t G = Gi ~ G 2 - Since L, M arc linear, we can write LitY) — iLiY) 
and MilY) = /.l/(y). Consequently, we obtain 

- iLiY) = (|lT||G(fF). 

Take t ^ 0. Dividing by t yields 

MiY) - LiY) = I|T||G((F). 

As t approaches 0, GitY) approaches 0 also. Hence the right-hand side 
of this last equation approaches 0. But MiY) — LiY) is a fixed vector, 
The only way this is possible is that il/(y’) — L(K) = 0, in other words. 
^/(T) = L{Y), as was to be shown. 

If there e.xists a linear map tangent to F at P, we shall denote this 
linear map by P'(P), or Z)P(P) and call it the derivative of F at P. We may 
therefore write 

FiP -h //) = FiP) 4- F'iP)H 4- ||//|1G(H) 

instead of (6). 

In the next section, we shall sec how the linear map P'(P) can be com¬ 
puted, or rather how its matrix can be computed when we deal with vectors 
as «-tuples. 



(XIII, §2] 


THE JACOBIAN MATRIX 


107 


Exercises 


1. Let/:R —» R be a function, and let a be a number, 
exists a linear map L tangent to / at a. Show that 


Assume that there 


id) = /(E+iLzM 

A-.0 A 


2. Conversely, assume that the limit 

A—o n 


exists and is equal to a number 6. Let/,6 be the linear map such that= bi 
for all numbers x. Show that Lb is tangent to /at a. It is customary to identify 
the number 6 and the linear map Lb, and to call either one the derivative of / 
at a. 

3. Going back to Chapter II, let X{t) be a curve, defined for all numbers t, 
say. Discuss in a manner analogous to E.xercises 1 and 2 the derivative dX/dl, 
and the linear map 7.<:R —» R" which is tangent to A' at 1. 


§2. The Jacobian matrix 

Throughout this section, all our vectors will be vertical vectors. We 
let Di,. . ., Dnhc the usual partial derivatives. Thus D, = d/dj,. 

Let F:IV* —♦ R”* be a mapping. We can represent F by coordinate 
functions. In other words, there exist functions/i, .. . ,f„ such that 

F{X) = = ‘Uiixy,.. .,U{X)). 

\UX)) 

To simplify the typography, we shall sometimes write a vertical vector 
as the transpose of a horizontal vector, as we have just done. 

We view X as a column vector, X = ‘(xj,. .., x„). 

Let us a.ssumc that the partial derivatives of each function fi 
(i = 1,..., m) exist. We can then form the matrix of partial derivatives; 

DJiiX) D„f:iX)\ 




m 


FUNCTIONS OF SEVERAL VARIABLES 


ixin, §2] 


i = ,m and j = 1,..., n. This matrix is called the Jacobian 

matrix of F, and is denoted by Mff(X). 


Example 1. Let be the mapping defined by 




Find the Jacobian matrix Mf{P) for P = (1, 1). 

The Jacobian matrix at an arbitrary point (.r, y) is 



Hence when .r = 1, y = 1, we find: 





Example 2. Let F:R^ ^ R* be the mapping defined by 



Find Mf(P) at the point P = (tt, 7r/2). 

The Jacobian matrix at an arbitrary point (j, y) is 


U x\ 
cos X 0 1* 

2xy xV 

Hence 

Mr (., f) = 

Theorem 2. Let U be an open set in H'*. Let F:U K"* be a mapping, 
having coordinate functions fi,. .. .4ssumc that each function fi is 
differentiable at a point X of U. Then F is differentiable at X, and the 
matrix representing the linear map DF{X) = F'(X) relative to the usual 
bases is the Jacobian matrix Mp{X). 

Proof. For each integer i between 1 and n, there is a function g,- such that 

Urn gJH) = 0, 




IXIII, §2] 


THE JACOBIAN MATRIX 




and such that we can write 

Si{X + i/) = Um + grad/i {X)-H + WW^H)- 

We view X and F(X) as vertical vectors. By definition, we can write 


Hence, 


F{X + //) = '(/,(X + ,UX + //}). 


F{X + H) = F(X) + 


grad/, (X) 

grad /2 (X) 
# 

.grad (X) 



+ !|//II 


92U^) 


The term in the middle, involving the gradients, is precisely equal to the 
product of the Jacobian matrix, times //, i.e. to 


Mp{X)H. 

Let G(H) = , Pn(H)) be the vector on the right. Then 

F(X + //) = F(X) + MpiX)II + WmCiH). 

As approaches 0, each coordinate of G{H) approaches 0. Hence 
G{H) approaches 0; in other words, 

lim (?(//) - 0. 

11//11-0 

Hence the linear map rcprc.scnted by the matrix f'{X) is tangent to F 
at X'. Since such a linear map is uni(|ue, we have proved our tli(‘orem. 


Kxkhcisks 

1. In each of the following cases, comjiutc the Jacobian matrix of /•’. 

(a) Fix,!/) = (x+y, j^y) (b) F(x. y) = (sin j, cos xy) 

(c) Fix, y) * (c'*', log x) (<1) Fix, y, = (x?, xy, yz) 

(e) Fix, y, z) = ixyz, x'^z) (f) Fix. y. 2 ) = (sin xyz, x:) 

2. Find the Jacobian matrix of the mappings in Exercise 1 evaluated at the 
following points. 

(a) (1,2) (b)(;r,T/2) (c)(l,4) (d) (1, 1,-1) 

(e) (2,-1,-1) (0 (jr,2,4) 

Let A, Dha two linearly independent elements in R". Lot F be a i)oint in R". 
The set of points of type 

PA- tA + uli, 

where t, u are numbers, is called a plane spanned by A, li (or parallel to .1 and 
B), and passing through P. Let V, 11' be vector spaces. Let L:V —* U' be a 



FUNCTIONS OF SEVERAL VARIABLES 


170 


pcni, 52] 


linear map. We recall that the kernel 0 / L is the set of all elements C otV such 
that L{C) = 0. 

3. Let .4, be two linearly independent elements of R". Let LiR" —> R" 
be a linear map, whose kernel consists of 0 only. Show that the image under 
L of the plane spanned by A and B and passing through P is also a plane. 

4. Let A, R, L be as in Exercise 3. Let C be any vector in R". Let F be the 
mapping defined by F{X) = L{X) C. Show that the image under F of the 
plane passing through P and parallel to /4, R is also a plane. 

Let V be an open set in R^, and let ^: f/ -» R" be a differentiable map. The 
image of F is called a (parametric) surface, parametrized by F. lict P be a point 
of U. Let L be the derivative of F at P, i.e. L = F'{P). Assume that the 
kernel of L is equal to 0. The plane passing through F{P) and parallel to L{Ei), 
L(E 2 ) is called the tangent plane to the surface at P. 

5. Let /’iR^ —* R^ be the mapping defined by 

F{t, «) = (/u, 

Find the equation of the tangent plane to the surface at (1, 2). 

6. Ixjt F:R^ —♦ R^ be the mapping defined by 

F{t,u) * (cos f, sin f, a). 

Find the equation of the tangent plane to the surface at {r, ir/4). 

7. Let —» R^ be the mapping defined by 

F{t, «) = (/, tu, tu+ I). 

Find the equation of the tangent plane to the surface at (2, —1). 

8. Let V be the open set in R* determined by the conditions 

0 < < 27r, 0 < r 

[using (0. r) as coordinates). Let F:t/ -> R2 be the mapping defined by 

a: = r cos 0, y = r sin 6. 

Find the Jacobian matrix of this mapping. After you have read the chapter 
on determinants, find the determinant of this matrix. 

9. Let V be the open set in R^ determined by the conditions 

O<0<2t, 0<p 

(using (0, p) as coordinates!. Let Fit' —» R^ be the mapping defined by 

X = p sin ^ cos 0, y = psin0sin0, z = pcos^. 

Find the Jacobian matrix of this mapj)ing. .After you have read the chapter on 
determinants, find the determinant of this matrix. 



ixin, §3) 


THE CHAIN RULE 


171 


§ 5 . The chain rule 

In the First Course, we proved a chain rule for composite functions. 
Earlier in this book, a chain rule was given for a composite of a function 
and a vector. In this section, we give a general formulation of the chain 
rule for arbitrary compositions of mappings. 

We shall need an auxiliary statement. 

Lemma. Let L:R" —» R”* be a linear mapping. There eiists a number 
h such that, for any vector X, we have 

\\LX\\ g b\\Xl 

Proof. We know that the mapping L can be represented by a matrix 
A = (a,-y). Let c be the maximum of all absolute values |a,;| of all the 
components of the matrix/I. I>et X = ‘(zi,... , i-„). 7'hen//(X) = .-lA'. 
The components of AX are given by 


+ • ■ • + UlrjJ'n. • • • . + * * * 4* OmriTn- 

Let US estimate these components, for in-stance the fir.st, 

4- • • • 4- Ulr»J‘n| ^ 4" • • • 4" lUlnJ^nl 

^ ■ ■ ■ 4- c|j„[ 

^ nc[LV|| 

because |i-;| ^ HAH for all j. We obtain a similar estimate for all com¬ 
ponents of AX. 

We find therefore 

IM-XII^ = (fliia:! 4- • • ■ 4" Uln-Tfi)^ 4- • • • 4- 4“ ■ ■ • 4- 

^ nfricl^llAp. 

Taking the square root, we obtain 

\\AX\\ g 6||A|| 

with the constant b = y/n{ncy^. Since L{X) = AX, we have proved our 
lemma. 

lA*t U be an open set in R". and let V he an open .set in U”*. I.x*t 
F:U —* R'" be a mapping, and assume that all values of F arc contained 
in V. Let G:V —» R* he mapping. Then wc can form the composite 
mapping G • F from U into RV 

Let X be a point of U. Then F{X) is a point of V by as.sumption. Ix’t 
US assume that F is difTerentiable at X, and that G is difTcrcntiahlc at 



172 


FUNCTIONS OF SEVERAL VARIABLES 


IXIII, §3J 


FiX). We know that F'iX) is a linear map from R” into R”*, and G'(F(X)) 
is a linear map from R*" into R*. Thus we may compose these two linear 
maps to give a linear map 

G'(F{X)) . F'{X) 

from R" into R*. 



The next theorem tells us what the derivative of G • F is in terms of 
the derivative of F at X, and the derivative of G at F{X). Please observe 
how the statement and proof of the theorem will be entirely parallel to 
the statement and proof of the theorem for the chain rule in the First 
Course. 


Theorem 3. Let V be an open set in R", let V be an open set in R”*. 
Let F:U —^R"*bea mapping such that all values of F are contained in V. 
LelG -.V be a mapping. Let X be a point of U such that F is differ¬ 

entiable at X. /Issiime that G is differentiable at F{X). Then the com¬ 
posite mapping G » F is differentiable at X, and its derivative is given by 

{G = F)'(X) = G'{F{X)) . F'(X). 

Proof. By definition of differentiability, there exists a mapping 4>i 
such that 

lim ^i(H) = 0 
liwii-*c 
and 

F{X + //) = F{X) + F\X) + 


Similarly, there exists a mapping <^2 such that 


and 


lim ^iiK) = 0, 

Il/Cll-O 


GiY + /C) = G{Y) + G\Y)K + |1K!|<^2TO. 


We let K = K{H) be 


F{X -}-//)- FiX) - F'iX)H + 



[XIII, §3) 


THE CHAIN' RULE 


173 


Then: 


G{FiX + //)) - G{F{X) + K) 

- G{FiX)) + G'{FiX))K + \\K\\<t2{K). 

Using the fact that G'{F{X)) is linear, wc can write; 

{G . F)iX + //) = ((?<> F)(X) -f G'{FiX))F'iX)H 

+ ||W||G'(f(Xj)4.,(ff) + IIWII 

If we put 

HH) = G'(F(X))4>,(//) + |j~|| i2(K), 

then I contend that 

lim ^(//) = 0. 

In fact, by the lemma, there is a number c sucli tliat 

1|C'(F(A-))4..(«)II S c||4>,(//)||, 

anti as ||//|| approaches 0, the riglit-liand side of tins inctjuality approaches 

0 . 

rurthermore, ///|1//|| has length 1. Hence, from the definition of 
K{II), the quotient /C/||//|| remains bounded. As ||//|| approaches 0, so 
does \\K\\, and hence so docs ‘I> 2 (/C). Consequently the second term 



appearing in the expression for 4>(//) approaches 0 as approaches 0. 
Wc have therefore shown that 

lim «!»(//) = 0. 

II//II-0 

Since we have 

(O’ » F){X + //) = (C . F)iX) -I- G'{F{X))F'{X)H -f- 

from the definition of differentiability, and tangent linear map, we con¬ 
clude that the linear map 

G'{F{X))F\X) 

is tang(*nt to G • F at X. It must therefore be erjual to {G » FnX), as 
wa.s to be shown. 

Remark. (To be read after the chapter on determinants.) You will 
recall that the chain rule in one variable had an analogue for int(‘gratiori, 



174 


FUNCTTIONS OF SEVERAL VARIABLES 


(XIII, §3] 


which was called "substitution” in the First Course. There is also an ana¬ 
logue in the present multidimensional case. Although we shall not state 
it with full precision, because it would require some notions which we 
have not discussed, nevertheless it is illuminating to see roughly in what 
context it holds, and we shall now describe this context. 

IvCt U, V be two open sets in n-space. Let 

G:U —* V and F:V U 

be two differentiable mappings, whose partial derivatives are continuous, 
and such that the composite mapping G » F \s the identity mapping of V, 
and the composite mapping F » G is the identity mapping of i’. ^Ve then 
say that F, G are inverse to each other. 

Denote the coordinate vector of a point in U by A' = (ji, . . . , a^n), 
and the coordinate vector of a point of F by J' = (yi, , i/n)- lA‘t/be a 
function on V. Then/« Cr is a function on U, namely the function such 

that 

(/-G)(A’) =f{G{X)) =f{y). 

Let Jf{X) denote the determinant of the Jacobian matrix of F at A", 
and call it the Jacobian determinant. Then Jp is a function on U. For a 
suitably restricted class of functions /, the chain ride for integration asserts 
that 

j- ■ ■jji(l(X))JKX) dX = J - jfiY) dY, 

V V 

provided that the Jacobian determinant is positive at each point of V. 
Here, we al)breviate </.r, • • • (/.r„ by dX, and similarly for dY. The inte¬ 
grals are multiple integrals, which are direct generalizations of the double 
integral in rectangular co<irdinates discussed in Chapter VIII, §1. 

The integrals given in terms (J polar atul spherical coordinates can then 
be viewed as special cases of tl»e above general chain rule. (Cf. Exercises 
8 and D.) 

To develop the theory of integration and t(» give a proof for the chain 
rule rc(iuires a fairly elaborate macliinery. which behmgs properly to an 
Advanced Calculus course. Still, it may be (juicting to many readers to 
see here already a more analytical reason for the integral given in terms of 
polar and spherical coordinates, than the geometric plausibility arguments 
given in Chapter VHI. 



CHAPTER XIV 


Deter 


II 


inants 


We have worked with vectors for some time, and wc have often felt 
the need of a method to determine when vectors are linearly independent. 
Up to now, the only method available to us was to solve a system of 
linear equations by the elimination method. In this chapter, we shall 
exhibit a very efficient computational method to solve linear equations, 
and determine w’hen vectors are linearly independent. 

It is sufficient to understand §1, §2, and §.'J to be able to work with de¬ 
terminants. The formal rules which are used to compute them are very 
easy to state. Hence the proofs of §0 through §0, which re(iuirc a somewhat 
higher level of abstraction, may be omitted without prejudice to the 
understanding of the computational aspects of determinants. The slight 
complication of notation is unavoidable when dealing with n X n matrices. 


§/. Determinants of order 2 

Before stating the general propertie.s of an arbitrary determinant, we 
shall consider a special case. 

Let 



be a 2 X 2 matrix. Wc define its determinant to be ad — cb. Thu.s the 
determinant is a number. 

The determinant can be viewed as a function of the matrix A. It can 
also be viewed as a function of its two columns. Let these be .^1* and 
as usual. Then we write tlie determinant as 


D{A), Det(/t), or 1){A',A^). 


The following propertie.s are easily verified by direction computation, 
which you should carry out completely. 

(1) As a function of the column vectors, the determinant is linear. 
This means: lot b', d’ be two numbers. Then 


Det 




+ Det 



175 



176 


DETERBilNANTS 


[XIV, §21 


Furthermore, if i is a number, then 

C“ 3=‘ (“ 9 • 

The analogous properties also hold with respect to the first column. 

(2) If the two columns are equal, then the determinant is equal to 0. 

(3) If A is the unit matrix, 



then Det (A) = 1. 

The determinant also satisfies the following additional properties. 

(4) If one adds a multiple of one column to the other, then the value 
of the determinant does not change. 

In other words, let i be a number. The determinant of the matrix 

/a -h tb 6\ 

\c -htd d) 

is the same as D(A), and similarly when we add a multiple of the first 
column to the second. 

(5) If the two columns are interchanged, then the determinant changes 
by a sign. 

In other words, we have 

3=-°‘=‘C 3- 

(6) The determinant of A is equal to the determinant of its transpose, 
i.e. D{A) = D{^A). 

Explicitly, we have 

Det (“ = Dot (“ ') . 

In the next section, we shall consider determinants of n X n matrices, 
and the analogous properties will give us a method for computing the 
determinant in general. 

§2. Properties of determinants 

Let A be an n X n matrix. It would be possible to define its determinant 
by a sum, just as we defined the determinant of a 2 X 2 matrix. How¬ 
ever, to write such a sum is a little complicated, and it turns out that to 
find the value of a determinant, it is not necessary to have this expression. 
What is needed is a set of properties which can be used to compute it. 



IXIV, 52) 


PROPERTIES OF DETERMINANTS 


177 


The determinant, denoted by D, or Det, is a function of square matrices, 
satisfying the properties listed below. In order to write down these prop¬ 
erties, we settle first some notation. 

Let A = (a,>) be a square n X n matrix. Its determinant will be 
written D{A), Det (A), or will also be denoted by surrounding the matrix 
with two vertical bars: 


Det (a,>) = 


an 



am 

4 

^nn 


If A A" are the column vectors of the matrix, we also write 

Z)(A',. .., A”) instead of D{A). 

We can now list the properties. 

(1) As a function of the column vectors, the determinant is Tinear. 
This means: suppose that the j-th column A* is equal to a sum of two 
column vectors, say A^ = C + C'. Then 


D(A‘, + A") 

= /)(A*,...,C,..., A") + i)(A C',. , A"). 


Furthermore, if Ms a number, then 

D(A\...,/A^...,A'*) = (D{A',..., AV... A'*). 

Observe that if we let t = 0, then we conclude that when one of the 
columns is the zero vector, the determinant is equal to 0. 

(2) If two columns are equal, then D{A) = 0. 

(3) If A is the unit matrix, then D(A) = 1. 

Tlie main theorem in this chapter will be the following. 

There exist determinants satisfying properties (1), (2), and (3). Such 

determinants are uniquely determined, and also satisfy properties (4) 

through (7) below. 

Since properties (1) through (7) are the only ones needed to compute 
determinants, we shall state the rest of them and give some applications 
before considering the existence and uniqueness proofs. 

(4) If one adds a multiple of one column to another column, then the 
value of the determinant docs not change. 

(5) If two adjacent columns arc interchanged, then the determinant 
changes by a sign. 

(0) The determinant of A is equal to the determinant of its transpose. 

In view of (C), we conclude that the determinant satisfies properties 
(1) through (5) with respect to rows, i.c. each one of the.se properties is 
valid if we replace the word “column” by the word “row”. 


178 


DETERMINANTS 


(XIV, §2] 


It is actually useful in practice to have one more rule, which gives us 
an analogue of the dehnition od — c6 for 2 X 2 matrices. To explain 
this additional rule, we need one more notion. 

Let t, j be a pair of integers between 1 and n. If we cross out the t-th 
row and j-th. column in the n X n matrix A, we obtain an (n — 1) X 
(n — 1) matrix, which we shall denote by Aij. It looks like this: 



Example 1. Let 



Our last rule may now be stated. It is called the expansion rule accord¬ 
ing to the i-th row. 

(7) Ixjt ./I be an n X n matrix as before. Then the determinant D{A) 
is equal to: 

(-l)' + 'a,, Dot (/la) + Del (. 4 , 2 ) 

+ ••• + (-l)‘'+"o,„ Det (/1,„) 

= E (/la). 

J-l 

This sum can be described in words. For each element of the t-th row, 
we have a contribution of one term in the sum. Tliis term is equal to + 
or — the product of this element, times the determinant of the matrix 
obtained from A by deleting the ?-th row and the corresponding column. 
The sign + or — is determined according to the chess-board pattern: 



In view of (6), we could also expand the determinant according to the 
j-th column in an analogous manner. 



(XIV, §2) 


PROPERTIES OF DETERMIN’AXTS 


179 


Example 2. We shall write out the expansion of a 3 X 3 determinant 
according to the first column. 

Let 


A = 


Oil 

O12 

Ol3\ 

021 

O22 

O23 I • 

O3I 

O32 

O33/ 


Then D{A) is equal to the sum: 


an 


022 

032 


022 

033 


021 


012 

032 


013 

033 


+ 031 


012 

022 


013 

023 


With all the above means at our disposal, we can now compute determi¬ 
nants very efficiently. In doing so, wc try to apply the operations de¬ 
scribed in (4) to make as many entries in the matrix equal to 0. We try 
especially to make all but one element of a column (or row) equal to 0, 
and then expand according to that column (or row). The expansion will 
contain only one term, and reduces our computation to a determinant of 
smaller size. 


Example 3. Compute 


3 0 1 
1 2 5 
-14 2 


We already have 0 in the first row. We subtract twice the second row 
from the third row. Our determinant is then equal to 

3 0 I 
12 5 ■ 

-3 0 -8 

Wc expand according to tlie second column. The expansion has only 
one term ^ 0, with a + sign, and that is: 

2 ^ • 

-3 -8 

The 2x2 determinant can be evaluated by our definition ad — cb, and 
we find 2(-24 - (-3)) = -42. 

Exercises 


I. Compute the following determinants. 



2 1 2 


3 -1 ft 


2 4 3 

(a) 

0 3-1 
4 1 1 

(b) 

1 

1 

1 

-! 2 1 
-2 4 3 

(c) 

-13 0 
0 2 1 


(foni.) 


180 


DETERMINANTS 


IXIV, §2j 


12—1 —15 3 

(d) 0 1 1 (e) 4 0 0 

0 2 7 2 7 8 

2. Compute the following determinants. 

1—2 4 —112 0 

1 13 V 0 3 2 1 

1 10 ^■'0412 

1 2 5 3 1 5 7 

311 4-92 

(c) 2 5 5 (d) 4-9 2 

8 7 7 3 1 0 

3. Make up matrices yourself and find their determinants until you feel 
that you can compute determinants rapidly. 

4. (a) Write out the expansion of a 3 X 3 determinant according to the 
second row in a manner similar to that of Example 2. (b) Write out the general 
formula for the e.xpansion of an n X n matrix according to the i-th row. 

5. Let xi, Z 2 , X 3 be numbers. Show that 

1 n Zi 

1 Z2 Z2 = (X2 — Xl)(X3 — Xl)(X3 — X 2 ). 

1 Z3 X 3 

Generalize to the 4X4 case, and to the n X n case. 

6. If ait), 6(0, c(0, d(0 are functions of t, one can form the determinant 

Q (0 6(0 

*^{0 diO ’ 

just as with numbers. Write out in full the determinant 

sin t cos i 
—cos t sin t 

7. Write out in full the determinant 

t + 1 f - 1 
t 2t-t-5 ‘ 

8. Let fit), git) be two functions having derivatives of all orders. Let v?(0 
be the function obtained by taking the determinant 




Show that 



m git) . 

no 9\t) 



giO 

g"{t) 


(i.e. the derivative is obtained by taking the derivative of the bottom row). 



(XIV, §3) 


Cramer’s rule 


181 


9. Generalize Exercise 8 to the 3X3 case, and then to the n X n case. 
[Hint: Expand the following determinant according to the first row.] 

fi 

/i • • • /; 

4 

4 

f(n-u 

10. Verify explicitly that the expression given for a 3 X 3 determinant in 
Example 2 satisfies properties (1), (2), (3) with resi)ect to rows. 

§J. Crarner^s rule 

The rules of §2 give us powerful tools to determine when vectors are 
linearly dependent. 

Theorem 1. Let /I , A” be column vectors (of dimension n). 1/ 
they are linearly dependent, then 

D{A\...,A’') = 0. 

If D{A^, ...» A") ^ 0, then .4*, . . . , ^1" are linearly imlepcndent. 

Proof. The second assertion is merely an etjuivalent formulation of the 

first. It will therefore .suffice to prove the first. A.'^sume that A\ .1’* 

arc linearly dependent. We can find numbers Ji,. . . , x„ not all 0 such 
that 

jiA' -t-1- x„A’' = 0. 

Suppose X} ^ 0. Tlien 

XjA' = —XiA* — ... — XnA” = ^ xjt/1*, 

k>^j 

it being understood that the j-lh term on the right-hand side does not ap¬ 
pear. Dividing by Xj, we obtain A’ as a linear combination of .4 , .4" 

(omitting A-’). In other words, there arc numbers yi, . . . , tjn .such that 

A^ = //,/!' ■ ■ ■ A-UnA” = 2 

the j-th term in the 8um being omit led. We get: 

D(A' .A") = f ij^A" _..f), 

which we can expand out using property (1). 4'hi.s yields 

y,D{A\...,A\...,A”) + •■■ + A", .... .I"), 

Here again, the j-term i.s omitted. In tlie other terms, we always have two 




182 


DETERMINANTS 


(XIV, §3) 


equal columns, and hence each such term is equal to 0 by property (2). 
This proves Theorem 1. 

Theorem 2. Lei , A" he column vectors {of dimension n) such 

that D{A S ..., A") 5^ 0. Let B be a column vector of dimension n also. 
Then there exist numbers Xi,. .. ,Xn such that 

XiA^ -f • • ■ + inA" = B, 


and for each j, we have 



where B occurs in the j’th column instead of AK In other words, 



®I1 • • • hi ... Oin 

<^21 ... 62 • • • Q 2 n 

« • 

• • ♦ 

• ♦ • 

« bfi ♦ > « 

^11 • • • C^ij • • • ^In 

^21 . . • a 2 j . • . a2n 

• • • 

• • • 

• • • 

flfiX • . « Q>nj • . • 


{The numerator is obtained from A by replacing the j-i\i column A^ by B. 

The denominator is the determinant of the matrix A.) 

Theorem 2 gives us an explicit way to find the coordinates of B with 
respect to A‘,. . . , A". In the language of linear equations, Theorem 2 
allows us to solve explicitly in terms of determinants the system of n 
linear equations in n unknowns: 


+ • • • + arnain = by 

i f i 

+ • ■ • + XnOnn — 

Proof. According to Theorem 1, the vectors A^ .. ., A" are linearly 
independent, and hence constitute a basis of R". Hence any vector B 
can be written as a linear combination of AA". Let B be written 
as in the statement of the theorem, and consider the determinant of the 
matrix obtained by replacing the j-th column of A by B. Then 

D{A ,..., B,..., A") — i)(A*,..., xiAi + • • • + x„A^,..., A"). 
We use property (1) and obtain a sum: 



') + ••• + D(A ^ . XjA\ ..., A") 


+ ----l-i>(AS 







(Xiv, §3) 


183 


Cramer’s rule 

which by property (1) again, is equal to 

A'.A") + ■•■ + x,D{A' . A") 

+ --- + jr„OU'. A" _ A-). 

In every term of this sum except the j-th term, two column vectors are 
equal. Hence every term except the j-th term is equal to 0, by property 
(2). The j-th term is equal to 


XjDi-i' . A"), 

and is therefore equal to the determinant we started with, namely 
D{A^,. .., B,. .., il"). We can solve for Xj, and obtain precisely the 
expression given in the statement of the theorem. 

The rule of Theorem 2, giving us the solution to the system of linear 
equations by means of determinants, is known as Cramer's rule. 

Example. Solve the system of linear equations; 

3j: + 2y + 42= 1. 

2.C — 1/ + 2 = 0, 

a- + 2i/ -1- 32 = 1. 

We have: 


1 

2 

4 


3 

1 

4 


3 

2 

11 

0 

-1 

1 


2 

0 

1 


2 

-1 

0 

1 

2 

3 


I 

1 

3 


1 

9 

tm 

1 

3 

2 

4 

-1 “ 

3 

2 

4 

> Z — “ 

3 

2 

4 

2 

-1 

1 


2 

-I 

I 


2 

-1 

1 

1 

2 

3 


1 

2 

3 


1 

2 

3 


Observe how the column 



shifts from the first column when solving for x, to the second column 
when solving for y, to the third column when solving for z. The detiomi- 
nator in ail three expressions is the same, namely it is the determinant of 
the matrix of coefficients of the equations. 

The determinants involved are easy to compute. One finds: 




184 


DETERMINANTS 


[XIV, §4] 


Exercises 

1. Write down systems of n linear equations in n unknowns (with n = 3 
and n = 4) and solve them by determinants until you feel that you have ab¬ 
sorbed the technique completely. 

2. Let /I be column vectors of dimension n and assume that they 

are linearly independent. Show that i){.4 .1") ^ 0. [Hint: Express 

each one of the standard unit vectors Ei, ..., viewed as column vectors 
as linear combinations of .1S . .., ^1". Using the fact that D{Ei, .. ., E„) = 1, 
and properties (1) and (2), prove the assertion.) 

3. Lot A be a triangular n X n matrix, say a matrix such that all components 
below the diagonal are equal to 0, 

an 
0 

0 

% 

# 

0 

What is /;(.!)? 

§4. Inverse of a matrix 

Let A be an n X « matrix. If Dct (A) ^ 0, then we say that ^4 is a 
7wn-singular matrix. If B is a matrix such that AB = I and BA = I 
(/ = unit n X n matrix), then we say tiiat B is an inverse of A, and we 
write= .1“*. If there exists an inverse of yl, then it is unique. Indeed, 
let C be an inverse of .4. Then C.-l = /. Multiplying by B on the right, 
we obtain ('AB = B. But CAB = C(AB) = Cl = C. Hence C = B. 
\ similar argument works for .4C = I. 

We shall see that property (7) allows us to construct an inverse for a 
non-singular matrix. 

Let .4 = (fi„) he a given n X n matrix. We had defined the matrix 
A,, obtained from .4 by deleting the i-th row and J-th column. Let 

= (-l)’-^>Det{.4>,). 

(Note the reversal of indices!) Let d = Dct (.4). The matrix dl is then 
diagonal; 



Theorem 3. Let B be the matrix (6„). Then 

AB = BA = dl. 

Ifd 9^ 0, then A~^ = ^ B. 

a 




0 ... 0 QnnJ 



IXIV, §4) 


INVERSE OF A MATRIX 


185 


Proof. We shall use property (7). For any pair of indices i, k the iX- 
component of A 5 is 


ttil&lJt “}" <li2^2k 4" * • ' + Utn^nk 

= Det (/lu) + • • • + 0 ,„(-l)‘'+” Del OUn)- 

If i = k, then this sum is simply the expansion of the determinant ac¬ 
cording to the i-th row, and hence this sum is equal to d. If i 9^ k, let 
I be the matrix obtained from A by replacing the ^‘-th row by the I'-th 
row, and leaving all other rows unchanged. If we delete the A*-th row and 
j-th column from A, we obtain the same matrix as by deleting the k-th 
row and j-th column from A. Thus 

Akj = Akj, 

and hence our sum above can be written 

(^m) + • ■ • + a,„(-l)*+'’ Det (Akn). 

This is the expansion of the determinant of J according to the i-th row. 
By property (2), Det (X) = 0. Hence this sum is 0. We iiave therefore 
proved that the tfc-component of AB is equal to d if i = k (i.e. if it is a 
diagonal component), and equal to 0 if t ^ k (i.e. if it is off the diagonal). 

Similarly, one can prove that BA = dl, thereby proving the first 
assertion in the theorem. The second assertion follows from the rule 

“ B^ • The inverse of the matrix A is therefore obtained 
by taking the transpose of the matrix 

(-l)’'+^Dct(A.,) \ 

Det (A) / 

Exercises 

1. Find the inverses of the matrices in Exercise 1, §2. 

2. Using the fact that if A, B are two nX n matrices then Det (/IB) = 
Det (/I) Det (B), prove that a matrix A such that Det (.1) = 0 docs not have 
an inverse. [The above-mentioned fact will be proved later.] 

3. Write down explicitly the inverse of a 2 X 2 matrix 



4. Verify explicitly the rule D{AB) =■ D{A)D{B) for 2X2 matrices. 







186 


DETERMINANTS 


{XIV, §51 


§5. Proofs of some properties 

In this section, we shall prove properties (4) and (5) from properties 
(1) and (2). In fact, let us assume that our determinant satisfies (1) and 
only the following weaker version of property (2): 

(2') If two adjacent columns are equal, then the determinant is equal 
to 0. 

We shall prove (5), (2), (4) in that order. 

Let j be some integer, 1 ^ j < n. We shall first prove (5), namely: 
// the j-th and (j + columns are interchanged, then the determinant 
changes by a sign. 

In the matrix A, we replace the j-th and O' + l)-th columns by 

We obtain a matrix with two equal adjacent columns, and 
by (2') we have: 

0 = D(...,A^' A- A^+^, A^ + A ^+\. ..). 

Expanding out using (1) repeatedly yields 



+ Z)(. . . , A\ A^+\ . 


, A\ . . .) 



I sing (2'), we see that two of these four terms are equal to 0, and hence 
that 


0 = Z){. . ., A\ A\ 



In this last sum, one term must be equal to minus the other, as desired. 

Our original property (2) can now be proved easily. Indeed, assume 
that two columns of the matrix A are equal. We can change the matrix 
by a successive interchange of adjacent columns until we obtain a matrix 
with equal adjacent column.s. (This could be proved formally by induc¬ 
tion.) Each time tiiat we make such an adjacent interchange, the de¬ 
terminant changes by a .‘^ign, which does not affect its being 0 or not 
Hence we conclude by (2') that DiA) = 0 if two columns are equal. 

Let us now prove property (4). Consider two distinct columns, say 
the kAh and >th columns .4* and A^ with k ^ j. Let / be a number. 
We add tA^ to .4*. By (1), the determinant becomes 


/)(..., /I* 4- tA\ . . .) = D{. 

T 

k 


• • •) + D{.. . , t/\.\ .. .) 

T T 

k k 


(the k points to the ^r-th column). 



IXIV, §6) 


UNIQUENESS 


187 


In both terms on the right, the indicated column occurs in the k-th 
place. But A*',...) is simply Z)(A). Furthermore, 

= fZ)(.A>,...). 

T T 

k k 

Since k 5 ^ j, the determinant on the right has two equal columns, because 
occurs in the fc-th place and also in the ;-th place. Hence it is equal to 
0. Hence 

A*+ = D(. A*, ...), 

thereby proving our property (4). 

§6. Uniqueness 

Before proceeding with the main part of the argument, we make some 
remarks on repeated linear maps, as in property (1). Consider first the 
case n = 2. If we have to expand 

D(3A + 5B, 2A - B), 

where A, B are 2-vectors, then using property (1), we obtain a sum of four 
terms, namely 

i>(3A, 2A - B) + B(5B, 2A - B) 

= £)(3A, 2A) -f B(3A, -B) + D{5B, 2A) -h Z)(5B, -B) 
= 6 /)(A, A) - 3D(A, B) + 10B(B, A) - 5B(B. B). 

Observe that we have used the fact that 

D(A, -B) = ~D{A,B), 

taking c = —1 in property ( 1 ). 

In such an expression, we note that B(A, A) = 0 and D{B, B) — 0. 
Thus only two terms remain to give a contribution which is not a priori 
equal to 0 . 

To give another example, let us expand 

D{2A + B - C, 3B + F) 

where A, B, C, E, F are vectors. Using (1) repeatedly, wc obtain six 
terms, namely 


6D(A, E) -1- 2D(A, F) + 3D(B, E) + D{B, F) - 3Z)(C, E) - D{C, F). 


DETERMINANTS 


188 


(XIV, §7] 


(As an exercise, prove this expansion writing out in detail each step as an 
application of property (1).] 

The principle involved in expanding such determinants is the following. 
We select one term from the first column and one term from the second, 
and take the sum over all such terms. A proof for the general expansion 
rule for arbitrary n could be given by induction, but we shall omit it. 
Using expansions such as these, we shall now show that any function of 
71 X n matrices satisfying properties (1), (2), (3) is uniquely determined, 
and we shall obtain an expansion for such functions. 

Let A be an n X n matrix, with column vectors A \ . . . , A”. Then 
we can write 

.1* = un/i" + ... 





+ a 


nn 



where /s‘, .... A’" are (he unit column vectors. Then 


IJ{A \ .... .1”) — _i_ . . . .- . . . ,ainA’* + •••-!- 


Using prop<T(y (1). we can express this as a sum of terms 


where ad), . . . ,a(n) denotes a choice of an integer between 1 and 7 i' for 

each value of 1, . , . , n. Thus a is a mapping of the set of integers 

{1.'b' into itself. Ily property (1), each one of the above terms can 

also be written 

If .some a a.ssign.s the same integer to di.stinct values i, j between 1 and n, 
then the determinant on the right lias two equal columns and hence is 
<‘qual to 0, C'on>.equently. we can take our .sum only for those <t which 
are such that atO dj) whenever i j. Such tr are called permutaUons. 
Instead of .saying tiiat we take the .sum for all permutations a, we ab¬ 
breviate the notation with the usual £ symbol. Thus we can write 


I)(A , . . . , .1 ) ~ ^ 


<U1) 


, . . . , J. 


The unit vectors -A*''"' occur in a permutation of the .standard 

arrangement A’, . . . , A", If we interchange successively two adjacent 
columns, we can reestablish the standard order for these unit vectors after 
a certain number of adjacent permutations. Each time that we permute 
two adjacejit columns, the determinant changes by a sign. If m{<T) is 
tlie number of transpositions of adjacent column vectors which have to 



IXIV, §7] 


DETERMINANT OF A TRANSPOSE 


189 


be carried out to reestablish the standard ordering of the unit vectors, 
then 

The sign (— 1 )'"^*'’ will be denoted by €(«r) and will be called the sign of 
the permutation. Thus finally, we can write 

(*) ■ ■ ■ I -^1") = S e((^)aff(i).i • • • 

the sum being taken over all permutations of the integers (1, . . . , n). 
Tliis expression shows that the value of the determinant is unicjuely 
determined by properties (1), (2), (3), or even (1), (2'), (3) since we saw 
that (2) follows from these. 

§ 7 . Determinant of a transpose 

Before proceeding any further, we make some comments on permuta¬ 
tions. If O’, T are two permutations of the integers {1,. . . , «} then we 
can form the compo.site permutation O’ » r, such that (o’ » t)(i) = (r(T(0). 
Given a permutation a, and an integer j, there exists a unique integer j 
such that <r{j) = i. (Here and afterwards, we let i, j, . .. denote integers 
between 1 and n.) We can define a permutation o’"* by the rule: o’"'(i) 
is the unique j such that aij) = t. Then a » a"* = o'"' « a = id is the 
identity permutation (the permutation such that id{i) = i for all i). 

Instead of writing c <• r, we shall also write err. Our last remark can 
then be stated ao’"' = o’"‘a = id. 

A Iranupn.nlion is a permutation which interchanges two numbers and 
leaves the others fixed. Every permutation can be expressed a.s a product 
of transpositions. This can easily be seen as follows. Suppose we liave a 
permutation (o-fl), . . . ,(r(n)}. If n = <r(j) for some j 9 ^ n, then we 
compose o’ with the transposition r which interchaiiges o’(j) and <T{j 1). 
Then raij) = aij -f 1). The effect of r is to move n one step further to 
the right. We continue this until n reaches the last po.sition. We then 
repeat the procedure, moving successively n — 1 furthest to the right, 
then ri — 2, etc. Kinally, we have a ser|ucncc of transpositions Tj, . . , , r, 
such that r,T,_, • • • tict = i/l. Then a = rf* • • • rf-'i t~\ Since the 
inverse of a transposition is a transposition, we have expressed (T as a 
product of transpositions. 

When a permutation <t is expre.ssed as a product of transpositions, then 
it can be shown that the parity of this number is always the same. In 
other words, suppose that 

<7 — Ti ■ ■ • Tr = T{ ■ • ‘ t', 

are two ways of exprc.ssing a as a product of transpositions. If r is even, 


190 


DETERMINANTS 


IXIV, §8J 


then so is s, and if r is odd, then s is odd also. (Hints for the proof will be 
given in an exercise.) Consequently, the number (—I)** is the same as 
(—1)*, and is called the sign of a. It is the number €(<r) which occurred 
above. It is also sometimes written sign(o-). 

If <r, a' are two permutations, and we write 

O' = Tj • • • r, and cr' = t'i • • • 

as products of transpositions, then 

crcr' = n • • • TrTi • • • rj. 

Hence 

e(ffff') = (-I)’'"''* = c(o-)€(o-'). 

The sign of a product of permutations is equal to the product of the signs. 
In particular, since = id, we obtain 

1 = €(<r{r”‘) = €(o')€(o'”^). 

Consequently €(cr) = 

We return to our discussion of determinants, and to the expression {*) 
which we obtained. 

Let cr bo a permutation of (1,. .., n}. If <t(j) — k, then <r”^(fc) ^ j. 
We can therefore write 

In a product 

each integer k from 1 to n occure precisely once among the integers 
(r(l), . . ., cr{n). Hence this product can be written 

and our sum (*) is ecjual to 

because €((r) = €(<r"‘). In this sum, each term corresponds to a permu¬ 
tation a. However, as <t ranges over all permutations, so does cr”^ be¬ 
cause a permutation determines its inverse uniquely. Hence our sum is 
equal to 

(**) 

a 

The sum (**) is precisely the sum giving the expanded form of the de- 



(XIV, §8] 


EXISTENCE 


191 


terminant of the transpose of A. Hence we have proved property (6), 
namely 

Det (X) = Det (M) 
from properties (1), (2), (3). 

So far, we have shown: If D is any function of n X n matrices satisfying 
(1), our weak property (2'), and (3), then D satisfies (2), (4), (5), (6), and is 
uniquely determined. 


%8. Existence 

We come to the question of existence. Do determinants exist at all? 
The answer is yes, and we prove this by induction. 

When n = 1, we deal with 1 X 1 determinants, and all our properties 
are obvious if we define Dot (a) = a, for any number a. 

To prove our assertion in general, it will suffice to give an argument 
which allows us to proceed stepwise. Suppose therefore that we have 
been able to define determinants for all integers < n, satisfying our prop¬ 
erties. Let A be an n X n matrix, A = {Oi,). Let i be an integer, 
1 ^ ^ n. We define 

(•••) A(A) = Dct(/1„) + ■•• + 

Each A ij is an (« — 1) X (n — 1) matrix. Wo shall prove that our func¬ 
tion A satisfies properties (1), (3), and the weak version of property (2). 

(Observe that (***) is the expression we would get from expanding a 
determinant according to the f-th row.) 

Note that A(/l) is a sum of terms 

as j ranges from 1 to n. 

(1) Consider A as a function of the A--th column, and consider any term 

(-l)‘+'a.vDet(.4.,). 

If j k, then a,> docs not depend on the fc-th column, and Det (.1,,) 
depends linearly on the A:-th column. If j = k, then a,) depends linearly 
on the A:-th column, and Det does not depend on the k-lh column. 
In any case, our term depends linearly on the k-th column. Since A(,l) 
is a sum of such terms, it depends linearly on the k-th column, and proper!y 
(1) follows. 

{20 We prove the weak properly 2. Suppose two adjacent columns of 
A are equal, namely A* = /t*"*"*. I>et j be an index k or k + 1. Then 
the matrix Aij has two adjacent ecpial columns, and hence its determinant 
is equal to 0. Tlius the term corre.'^ponding to an index j k or k -1 I 


192 


DETERMINANTS 


[XIV, §9] 


gives a zero contribution to A(A). The other two terms can be written 

Det (A.i) + (-l)‘+‘+‘o.-,i+, Det (A.-,i+i)- 

The two matrices Aik and are equal because of our assumption 

that the k-th column of A is equal to the (k + l)-th column. Similarly, 
aik = tti.fc+i- Hence these two terms cancel since they occur with opposite 
signs. This proves the weak property (2'). 

(3) Let A be the unit matrix. Then a,/ = 0 unless i = j, in which 
case a„ = 1. Each A,;- is the unit (n — 1) X (n — 1) matrix. The 
only term in the sum (***) which gives a non-zero contribution is 

(-l)‘+'a,iDet (An), 

which is equal to 1. This proves property (3). 

From §6 we conclude that A is the unique determinant function satisfy¬ 
ing (1), (2), (3). Furthermore, A also satisfies the rule for the expansion of 
determinants according to rows, i.e. property (7) is satisfied. Everything 
is proved. 


Exercises 

1. Let xi, ..., be variables and let a be a permutation of the numbers 
{1,..., n}. Then there is a number <(^), equal to -fl or —1, such that 

»<; •</ 

where the symbol PI means that one should take the product over all pairs of 
integers t, j such that 1 ^ t < j ^ n. If r is a transposition, show that 
({t) = — 1. 

2. Show that «(cr) is equal to (—1)”*, where m is the number of pairs (t,j) 
such that 1 ^ i <j ^ n and <r(i) > <T(j). This number m is called the number 
of inversions of <r. 

3. Let cr' be any ])ermutation of {1, ..., n}. Substitute tr'ik) for j* in the 
product of E.xercise 1. Conclude that 

<((rcr') = 


[ 


4. A permutation a of the integers is sometimes denoted by 


1 

<T{1) 


n 

(r(n)J 


Thus 




L2 


3 

3J 


denotes the permutation <t such thatff(l) = 2, 


<t{2) — 1, <r(3) = 3. This permutation is in fact a transposition. Determine 
the sign of the follow’ing permutations. 



12 3 
2 3 1 




IXIV, §9] 


DETERMINANT OF A PRODUCT 


193 



5. In each one of the cases of Exercise 4, write the inverse of the permutation. 

6. Show that the number of odd permutations of {1,..., n} for n ^ 2 is 
equal to the number of even permutations 

§ 9 . Determinant of a product 

We shall prove the important rule; 

Theorem 4. Let A, B he two n X n matrices. Then 

Det (AB) = Det (.4) Det (B). 

The determinant of a product is equal to the product of the determinants. 
Proof. Let A = (a,>) and B = (bjk) ■ 



Let AB = C, and let C* be the k-th column of C. Then by definition, 
Thus 

D{AB) = D(C‘, ...,0”) 

= D{biiA ' + ..• + 6„iA",.. ., fjjnA ‘ + • • • + ^n^A"). 

If wc expand this out using property (1), we find a sum 

^ . 

0 

— ^ ■ ■ • Kln),nD{A*, . . . , A”). 

0 

According to the formula for determinants which wc found, this is eijual 
to D{B) Z>(A), as was to be shown. 


CHAPTER XV 



plex Numbers 


One of the advantages of dealing with the real numbers instead of the 
rational numbers is that certain equations which have no solutions in the 
rational numbers have a solution in real numbers. For instance, = 2 
is such an equation. However, we also know some equations having no 
solution in real numbers, for instance = — 1, or = —2. In this 
chapter, we define a new kind of number where such equations have 
solutions. What we have simply called numbers in the preceding chapters 
will now be called real numbers. The new kind of numbers will be called 
complex numbers. 

§i. Definition 

The complex numbers are a set of objects which can be added and 
multiplied, the sum and product of two complex numbers being also a 
complex number, and satisfy the following conditions. 

(1) Every real number is a complex number, and if a, 0 are real numbers, 
then their sum and product as complex numbers are the same as their 
sum and product as real numbers. 

(2) There is a complex number denoted by i such that = —1. 

(3) ICvery complex number can be written uniquely in the form o -f hi 
where a, b arc real numbers. 

(4) The ordinary laws of arithmetic concerning addition and multi¬ 
plication are satisfied. We list these laws: 

If a, 0, y are complex numbers, then {a0)y = a{0y), and 

(a + = a + ()3 + 7). 

We have a(0 + 7) = a)3 + q7, and (S -{- 7)a = 0a + 7a. 

We have a0 = 0a, and a + /3 = -j- a. 

If I is the real number one, then la = a. 

If 0 is the real number zero, then Oa = 0. 

We have a + (—l)a = 0. 

We shall now draw consequences of these properties. With each com¬ 
plex number a -|- bi, we associate the vector (a, 6) in the plane. Let 
a = 01 + 021 and 0 — bi b 2 i be two complex numbers. Then 


a + ^ = Oi + 6 i + (02 + ^2)1. 

194 



IXV, 51] 


DEFINITION 


195 


Hence addition of complex numbers is carried out “componentwise” and 
corresponds to addition of vectors in the plane. For example, (2 ■+■ + 

(-1 + 5t) = 1 + 8i'. 

In multiplying complex numbers, we use the rule = —1 to simplify 
a product and to put it in the form a + hi. For instance, let a = 2 -{- 3z 
and j3= 1 — i. Then 

a/3 = (2 + 3i)(l - i) = 2(1 - i) + 3i(l - i) 

= 2 - 2t■ + 3f - 3 z2 
= 2 + i - 3(-l) 

=2+3+1 


Let a = a + 6i be a complex number. We define a to be a — bi. 
Thus if a = 2 + 3», then a = 2 — 3z. The complex number a is called 
the conjugate of a. We see at once that 


aa = + b^. 

With the vector interpretation of complex numbers, we see that aa i.s 
the square of the distance of the point (a, b) from the origin. 

We now have one more important property of complex numbers, which 
will allow us to divide by complex numbers other than 0. 

If a = a + bi is a complex number and if we let 

^ ~ a2 + 62 

then aX = Xa = 1. 

The proof of this property is an immediate consequence of the law of 
multiplication of complex numbers, because 

5 _ aa _ 

“ a2 + (,2 - (,2 + - *• 

The number X above is called the inverse of a, and is denoted by a”' or 
1/a. If a, /3 are complex numbers, we often write 0/a instead of a”*/3 
(or i8a“‘), ju.st as we did with real numbers. We see that we can divide 
by complex numbers ^ 0. 

Wo define the absolute value of a complex number a = G| + ioj to be 

|a| = Va? + al. 

This absolute value is none other than the length of the vector (qi, 02 ). 


196 


COMPLEX NUMBERS 


(XV, §11 


In terms of absolute values, we can write 

_l _ a 

“ “ kP 

provided a 7 ^ 0. 

The triangle inequality for the length of vectors can now be stated for 
complex numbers. If a, are complex numbers, then 

i« + ^1 ^ |a! + |^|. 

Another property of the absolute value is given in Exercise 5. 

Exercises 

1. Express the following complex numbers in the form a;+ iy, where x, y 
are real numbers. 

(a) (-l + 3i)-‘ (b) (l + t)Cl -i) 

(c) (l + t)f(2-t) (d) (t- l)(2-t) 

(e) (7 + m)(ir + i) (f) (2i + Dri 

(g) (V2i)(7r + 3i) (h) ( 1 + I)(i - 2)(i+ 3) 

2. Express the following complex numbers in the form x + vj, where x, y 
are real numbers. 


M (1 + t)‘‘ 

3 + .• 

/ \ 2 + i 

2 - i 


( 0 ) ‘ + 

(f) * 

(g) “*■ 

fh) ^ 

t 


3 _ . 

-1 + t 


3. Lot a be a complex number 0. What is the absolute value of a/a? What 
is o? 

4. Let a, 0 be two complex numbers. Show that a0 = and that a = 

a + (S. 

5. Show that [a/Sl = [aj 1^31. 

6 . Dofine adililion of n-tuples of com|>lex numbers comj>onentwise, and multi¬ 
plication of n-tuples of com{>lex numbers by complex numbers componentwise 
also. If .1 = (ai, . . . , a„) and D — (di»....dn) are n-tuples of complex 
nuntbers, define their scalar product (.1, B) to be 

oti^i + • • • + a,Jn 

(note the complex conjugation!). Prove the following rules: 

SPl. = (BTT). 

SP2. <.1,BH-C) = + 

SP3. If a is a complex number, then 

{aA,B) = a{A,B) and {A,aB) = a(.l, B). 

SP4. If .1 = 0 then (.1, .1) = 0, and otherwise, (.1, .1) > 0. 



(XV, $2] 


POLAR FORM 


197 


(Observe the complex conjugates which appear. The scalar product is defined 
as we have done in order to preserve the positivity of SP4. For that purpose, 
we are willing to allow the complex conjugate in SPl, instead of the rule (.4, B) = 
A).) 

§2. Polar Jorm 

l^t (x, i/) — X + iy be a complex number. We know that any point 
in the plane can be represented by polar coordinates {6, r). We shall now 
see how to write our complex number in terms of such polar coordinates. 

Let d be a real number. We define the expression to be 

e'® = cos d i sin Q. 


Thus e*® is a complex number. 

For example, if = tt, then c*' = -1. Also, = 1, and = i. 
Furthermore, = c‘® for any real d. 



It will allow us to have a very good geometric interpretation for the 
product of two complex numbers. 

TiiEOiiKM 1. Let d, ip be two complex numbers. Then 

Proof. By definition, we have 

= e"«+^'= cos{0 -\-ip) + ism ($ + 

Using the addition formula for sine and cosine, we obtain: 

cos 5 cos ^ — sin 9 sin <p +t(sin 0 cos vj — sin ip cos 9). 

This is exactly the same expreasion as the one we obtain by multiplying 
out 

(cos 9 + i sin tf)(cos vj + f sin <p). 

Our theorem is proved. 


i. 


198 


COMPLEX NUMBERS 


IXV, §2] 


Theorem 1 justifies our notation, by showing that the exponential of 
complex numbers satisfies the same formal rule as the exponential of real 
numbers. 

Let a = aj + 202 be a complex number. We define e* to be 

For instance, let a = 2 -f- ,32. Then 

Theorem 2. Let a, Q be complex numbers. Then 


Proi}f. Let a — Oi -j- 10-2 and /S = -f- 262 . Then 

+ + «(aj+ 62 ) 

I .‘^ing I heorom 1, we see that this la.st expression is equal to 


By definition, this is e(|ual to c^c^, thereby proving our theorem. 

'I'hoorem 2 is very n.<eful in dealing with complex numl)ers. We sliall 
now consider .-ieveral examples to illu.‘<trate it. 




Example 1 . Find a complex number whose .<(piare is 4 c 

Let j = 2c" ■*. Using the rule for expom’ntials, we see that = 4 e‘^-. 

h.iample 2 . Let n be a positive integer. Find a complex number 10 such 
that 22 ’'* = f"'‘. 

It is clear that the complex number le = e"'-*' sati.sfies our requirement. 

In other words, we may express Theorem 2 a.s follows. Let e, = rjc*®' 
and r.,e '^2 be two complex numbers. To find the product 2 , 22 , we 
multiply tlie absolute values and add the angles. Thus 

2,22 = 


In many cases, this \\i 
more u.s{-ful than that 


ly of visualizing the product of complex numbers is 
coming out of the definition. 


Exercises 


1. Put the following complex numbers in polar form. 

(a) 1 + I (b) 1 + iV 2 (c) —3 (d) 4 i 

(e) 1 - iv2 (0 -5i (r) -7 (b) -1 - I 

2. Put the following complex numbers in the ordinary form x hj. 

(a) (b) 7 r( -'''3 

(e) (f) e -'^2 (g) 



(XV, §2) 


POLAR FORM 


199 


3. Let a be a complex number ^ 0. Show that there are two distinct com¬ 
plex numbers whose square is a. 

4. Let a be a com[)lex number 9 ^ 0. Let n be a positive integer. Show that 
there are n distinct complex numbers z such that z" = a. Write these complex 
numbers in polar form. 

5. Let a = 1 in Exercise 4. Plot all the complex numbers 2 such that 2 " = 1 
on a sheet of graph paper, for n = 2, 3. 4, and 5. 

6 . Let 0 -|- be a complex number. Find real numbers x. y such that 

(x + iy)^ = a -1- 61 , expressing x, y in terms of a and 6 . 

7. Let u) be a complex number and suppose that 2 is a complex number 
such that e' — w. Describe all complex numbers u such that c“ = w. 

8 . What are the complex numbers z such that e* = 1? 

9. If 8 is real, show that 

cos e = —2 — ^ “ — 2 - 

10 . A mapping from the real numbers into the complex numbers is called a 
complex valued function. Such a function can be written 

Fit) = m + igit), 

where/(O and g{t) are real valued functioi>s. Define the derivative F'U) to be 
/'(O + V( 0 - [This corresponds to the derivative of the vector (/(O, ?{ 0 l l 

The derivative F'it) (also written dF/dt) is defmed only when both /. g are 

differentiable, of cour.so, in which case we say that F is dijfirttiliable. 

(a) Let F. G be com|)lex valued functions which are differentiable. De¬ 
fine their sum in a natural way. and show that 


„ F'U) + «'((). 
dl 

If c is a complex number, show that 

= cF'H). 
iU 

(b) lA?t F, G be cotn|)lex valued functions which are differentiable. De 
fine their product in a natural way, and show that 


diFG) 

dt 


= riDGit) + F(l)G'(l). 


(c) Let F be a differentiable complex valued function. Show that 


j, F(^}^ 

= F'U)/'" 
dt 



200 


COBIPLBX mmSERS 


[XV, §2] 


11. Let F(0 = /(O + ^(0 be a complex valued function of t, and assume 
that /, g are continuous. We then say that F is continuous. Define the in¬ 
definite integral as usual by 



(a) If there is a differentiable function G such that F{t) = G*{t), show 
that ^F{t)<U = G{t). 

(b) Let a be a complex number 9 ^ 0. Show that 



(c) Let n be an integer. Find 



12. Observe that the theory of bases, linear dependence, and determinants 
applies to n-tup!es of complex numbers, and to vector spaces over the complex 
numbers, if one replaces the word “number” in the text by the word "complex 
number". 


13. Compute the following determinants: 

2 - i -t 
4i ~2i 

(c) Make up 3 X 3 determinants of complex numbers and find their 

value. 



14, Are the vectors (t. —I,2 + t), (W. I,-t). (1,1,2) linearly inde¬ 
pendent over the complex numbers? 



Appendix 1 

Induction 


In the course of several proofs, we have given a “stepwise” argument. 
One can formalise this type of argument, which is called induction. 

Suppose that we wish to prove a certain assertion concerning positive 
integers n. Let A{n) denote the assertion concerning the integer n. To 
prove it for all n, it suffices to prove the following. 

(1) The assertion yl{l) is true (i.e. the assertion concerning the integer 
1 is true). 

(2) Assuming the assertion proved for all positive integers ^ n, prove 
it for n + 1, i.e. prove A(n + 1). 

Step (2) is the procedure which allows us to proceed from one integer 
to the next, and step (1) gives us a starting point. We shall now give 
two examples. 


Theorem I . For all integers n ^ \, we have 


1 + 2 + • • • + n = 


w(n + 1) 
2 


Proof. By induction. The assertion /i(«) is the assertion of the theorem. 
When n = 1, it simply stales that 

, = ML+il, 


and is clearly true. Assume now that the assertion of 1 heorem 1 is true 
for n. Then: 


l+2 + -- - + n + (n+l) = + (n + I). 


Putting the expression on the right of the equality sign over a common 
denominator 2, we see that it is equal to 

+ n + 2n + 2 (n + !)(« + 2) 

- 2 2 


Hence assuming A{n), we have shown that 


1 4. 2 4- . - • + (n + 1) = 


(n + l)(n + 2) 


which is none other than assertion A{n + !)• This proves our theorem. 

201 



202 


INDUCTION 


[app. 1] 


Theorem 2. Every n X n matrix can be transformed into a matrix all 
of whose components are equal to 0 except those on the diagonal, which 
may be equal toO or 1, by means of the following operations: Interchang¬ 
ing two columns, multiplying a column by a non-zero number, adding 
one column to another, performing these operations on rows instead of 
columns. 


Proof. By induction. The assertion is trivially true for 1 X 1 matrices 
since in that case, there is only a diagonal component in the matrix. 
A.s.sume the assertion proved for (n — 1) X (n — 1) matirces. We shall 
prove it forn X n matrices (n ^ 2). 

Let 


A = 




be an n X n matrix. If some element of the first column is not equal to 
zero, say a,j 9 ^ 0, wc then interchange the f-th row and first row. We 
can then transform our matrix into another matrix whose 11-component 
is not e(iual to 0. Multiplying the first row by the inverse of the 11- 
component, we can achieve that this ll-component is equal to 1. We 
now multiply tiie first row by o., {i > I) and subtract it from the i-th 
row for each integer i satisfying 2 5 i ^ n. This transforms our matrix 
into a matrix of type 



with 1 in the upper left-hand corner, and 0 in the first column and i-th 

row, i ^ 2. We now multiply the first column by —6i> for each integer j 

such that 2 ^ j ^ n and add it to the j-th column. We then obtain a 
matrix of type 

/1 0 ... 0 \ 

? ... 

Vo y 

having a component I in the upper left-hand corner, and 0 otherwise in 
the first column and first row. 

If every element of the first column was equal to 0 in the first place, 
and some element of the first row wa.s not c(jual to 0, wc carry out the 
preceding arguments on the transpose of the matrix, to transform our 



(app. 1) 


INDUCTION 


203 


original matrix again into one of type 


B = 


1 0 ... 0 
0 


• ** 


If every element of the first column and first row of the original matrix 
was equal to 0, then we deal with a matrix of type 


B = 


0 0 ... 0 
0 


**« 


In either one of the above cases, wc end up with a matrix all of whoso 
components on the first row and first column are ecjual to 0 except the 11- 
component. 

The matrix denoted by *** in ail cases is an (» 1) X (n 1) 

matrix. Hence by induction hypothesis, using the three operations on the 
rows and columns of ***, we can transform the matrix *** into a matrix 
all of whose components are equal to 0, except those on the diagonal, 
which are equal to 0 or I. We can view an operation on rows or columns 
of *** as arising from an operation on the corresponding rows or columns 
of B (since the operations will not affect the components of B on the first 
column or first row, these being all equal to 0 except in the upper left-hand 
corner). We have now transformed our matrix into one of the desired type, 
thereby concluding the proof. 


Exercises 


I. Prove that for every positive integer n, we have 


l^-h 


2 n(n + l)(2n + I) 
+ n--- 


2. Let denote the binomial coefficient, — A)! ’ ^ 

integers ^ 0, 0 ^ A ^ n, and 01 is defined to be equal to 1. Prove the fol¬ 
lowing assertions. 

" (:) ■(.:.) »> t (:)-(■ t')"" * >» 



204 INDUCTION (aPP. 1] 

3. Let Pi,..., P„ be n points in m-space. Show that any convex set which 
contains Pi,..., Pn also contains all linear combinations 

xiPi + * • • + XbP», 

such that 0 ^ X| ^ 1 for ail t, and xi + * ■ * + Xr = 1. [Hint: Use induction, 
and the fact that if x„ ^ 1, then the above linear combination is equal to 

''" (r^ /■! + • • • + p„-.) + 

4. Just for fun (you won’t need induction), show that the set consisting of 
all linear combinations as in Exercise 3 is itself convex. 

5. Let /i,... ,/r be differentiable functions of one variable. Let f denote 
the derivative of a function. Prove that the derivative of the product /i • • • f* 
is equal to 

(/!•••/«)' = /i /2 ■ • ■ fit +/1/2 ■ • ’/b + ■ ■ ■ + /i ■ • 



Appendix 2 

€ and 5 again 

Let F be a vector space. A norm on F is a function, which to each ele¬ 
ment y of F associates a number, denoted by ||j' 1|, satisfying tlie following 
properties: 

Nl. U V 0 then ||y|| > 0, and ||0|| = 0. 

N2. If I’l, V 2 are elements of F, then 

\\vi + ViW s ||t>i|| + Wvzt 

N3. If c is a number and v an element of F then 

lice'll = kl Ikll. 

In this book, we dealt with norms arising from scalar products, but it is 
convenient to study norms independently of scalar products. A vector 
space, together with a norm, is called a nonned vector space. 

Let F, W be two normed vector spaces. Let S be a (non-empty) subset 
of V', and let F: V —* W be a mapping of S into \V. Let t'o be an element 
of F and w an clement of W. We assume that *S comics arbitrarily close to 
yg, namely given a number < > 0, there exists y in 6’such that ||i' — I'oll < 
€. We shall say that F(t') approaches the limit tv as v approaches Cg if the 
following condition is satisfied. 

Given c > 0 there exists 6 > 0 such that whenever r is in S and 

||y - yoll < ^ then H/fy) - /{ty)|| < e. 

We can al.so rephrase this in the usual manner as follows; We write 

lim F{v() -h u) = u) 

and say that the limit of Fivo + u) is w, as u approaches 0 , if, given e > 0, 
there exists i > 0 such that whenever u is in F, and u lies in S, 
and ||u|| < h then 

j|F(yo + u) — toll < c. 

It is then possible to prove in exactly the same manner as for functions 
of one variable the various theorems concerning the formal properties of 
limits. Of course, the theorem concerning the limit of a product will hold 
only when IF = R, and we can multiply functions. It should also be 
pointed out that a similar theorem holds when we take the dot product. 

205 



206 


€ AND S AGAIN 


(app. 2] 


The formulation is essentially the same as the ordinary one, and we state 
it as an example (but leave the proof as an exercise, using the Schwarz 
inequality). 

Lei W be a vector space with a scalar product, and lei V 6c o normed 
vector space. Let S he a subset of V, and let F, G:S —* W be two maps of 
S into W. Let Vq be an element of V. Let 

w — lim F{v) 

and 

I 

w' = lim G{v). 

Let F • G be the function on S defined by {F ■ (f)(y) = F{v) • (?((’). Then 

w ‘ u/ = lim {F ' (?)(y). 

P—♦Vo 

In dealing with vectors, wc considered maps into R". In that case, we 
iiave made use of the following result. 

Let V be a normed vector space, and S a subset of V. Let F:S —» R" 
he a mapping, and let F\, Fn be its coordinate functions, i.e. 

F{v) = {FM ,.... 

Lc.tv) = , u>„) be an element o/R” and t?o element of V. Then 

lim F{v) = w 

v^i'O 

j/ and only if, for every i — \, ... ,n we have 

lim Fiiv) = Wi. 

P—•to 

Proof. Assume first that lim F(v) = xv. Given 6 > 0, find 5 > 0 as 

r-.ro 

in the definition of limits. Then whenever Hd — I’o!! < 5 we have 
!|F{ii) — uj|| < c. Note that 

F{v) - w = {F i(f) - u)i, . . . , Fr,iv) - uf„). 

If X = (.rj, .... .T„) is an n-tuplc, then |j*j| ^ 1|X||. Hence for all i, 

|F,(y) — uj.l g ||/» — uj|| < e. 

This proves that lim F,{y) = 



(app. 2) 


€ AND 6 AGAIN 


207 


Conversely, assume that these limits hold for each i. Given e > 0, 
find 3 > 0 such that whenever ||i» — I'oll < 6 we have 

— u),| < 

V7l 

for all i = 1, . . . , n. Then 

mv) - u>ii = VE?_, (F.w - u.,)2 



< €. 


This proves that lim F{v) = w. 

V—•Vo 

Roughly speaking, what the preceding assertion means is that when¬ 
ever two vectors are close together, their coordinates are also close to¬ 
gether. Of course we have used this many times in the course of the book, 
but the purpose of the present appendix is to show how such notions can 
be reduced to notions concerning only numbers and elementary notions 
of logic. 

Finally, let V, W be two normed vector spaces. Let 5 be a subset of 
V, and let t^o element of S. Let F:S —* IF be a mapping. We say 
that F is continuous at e© if 


lim F(v) = F{vo), 

p-*vo 

or if Do is an isolated point of S. (You define what “isolated" means.) 
Again the theorems concerning continuous maps (sums, products if rele¬ 
vant, composition) are proved essentially without any change from the 
ca.se of functions of one variable. We leave the proofs as exercises to the 
reader. (The point is that continuity is defined in terms of limits, and tlms 
any property of limits immediately extends to the analogous property 
concerning continuity.) 

In discussing error terms as in Chapter III, it is convenient to introduce 
certain abbreviatiotis, and a convenient terminology to deal with orders of 
magnitude. We shall give here the relevant definitions. 

We consider functions defined on subsets of n-space. A function / will 
be said to be defined for small values of // if there exists a number c > 0 
such that / is defined for all II in n-space such that H ^ 0 and ||//|| < c. 
We shall say that f is o(//) (which we read "little oh of //") if 


lim 


m 

ii^ii 




208 


€ AND 5 AGAIN 


[app. 2] 


The following statements are then easy to prove, and are left as exercises. 
We assume that all functions involved are defined for small values of H. 

(1) If fu /2 ore twofunclions which are both o{H), then soisfi + f 2 - 

(2) /// is o{H), and C is a number >0, then Cf is also o{H). 

(3) Iff is oIh), and if g is a function such that 

\g(H)\ ^ l/WI 

for all sufficiently small values of H, then g is also o{H). 

(4) A function f is oiH) if and only if f can be written in the form 

m = \\H\\g{H) 
with some function g such that 

lim giH) = 0. 

In our definition of differentiability at a point, we could say that f is 
differentiable at X if and only if 

f{X A-H) = f{X) 4- grad/m • H + o{H). 

Also observe that \f H = (^i,..., we have 

iA.I ^ 11^1! 

for every i = 1, ... ,n. Using the o-notation, it is possible to rewrite in 
abbreviated form some of the proofs of Chapter III, and later proofs in¬ 
volved in Taylor’s formula. 

It is also clearly possible to define the o-notation for mappings with 
values in a normed vector space, and use the o-notation in the general 
formulation of differentiability as in Chapter XIII. Here again, it would 
be a good exercise to rewrite the relevant passages using this notation. 



Appendix 3 


Sine, Cosine, and Angle 


In our two courses in calculus, the only notions for which geometric 
definitions were given were those of sine and cosine. As for angle, no 
definition was ever given, we just drew pictures. We have shown how to 
give definitions for all other notions in terms involving only properties of 
(real) numbers. Such definitions are called analytic. There is of course 
nothing wrong about using pictures, and it would be insane to have inhi¬ 
bitions about them, but it is reasonable to ask wliether it is possible to 
develop the theory of angles, sine, and cosine without appeal to geometric 
intuition, i.e. give for these notions purely analytic definitions, and prove 
their properties purely analytically. This is possible, but involves a fair 
amount of theory which it is impossible to present in elemet)tary courses, 
for obvious reasons (at least, granting the present sequence of courses in 
elementary schools). Still, it now seems worthwhile to show how the 
theory can be developed. To do so, we shall use theorems proved iti this 
course, concerning both differentiation and linear algebra. These theorems 
and their proofs did not depend on the notions of sine, cosine, and angle, 
so that our logic is not circular. At the end, we shall recover all the usual 
properties. 

By number, throughout this appendix, we shall mean real number. The 
proofs of §1 use only calculus, and except for Proposition 1, use only results 
proved in the First Course. The proofs of §2 use only results of linear 
algebra proved in the present volume, except for the last proposition, 
which relates the algebra and the calculus. 


§i. The functions sin and cos 

Proposition 1. There exists a unique pair of functions f, g defined for 
all numbers, which are differentiable, such that 



and such that /(O) = 0 and ff(0) = 1. 


Proof. There are a number of ways of proving the existence. One of 
them is to prove that the series 




/w = E (-1) 


.2n + l 




n«* 1 


{2n+ 1)! 


and 


g(z) = (-1)" 


2n 




(2n)! 


209 



210 


SINE, COSINE, AND ANGLE 


(APP. 3, §1) 


can be differentiated term by term. This is proved in every course in 
advanced calculus, and we shall omit the proof here. It is then clear that 
these functions satisfy our requirements. 

Note. The fact about series quoted just now is rather easy to prove. It 
will be the only fact used in this appendix for which no proof has been 
given in our Courses. In this section, we shall make repeated application 
of the theorems concerning increasing and decreasing functions proved in 
the First Course. 

As for uniqueness, let /i, gi be functions such that 

/{ = gi and gi — —fu 

and assume only that/i(0)^ + gi(0)^ = 1. 

Differentiating the function /? + gl, one finds 0. Hence this function 
is constant, and hence is equal to the constant 1. 

Next we differentiate the functions/ffi — fig and//i + ggi- We find 0 
in each case. Hence there exist numbers a, b such that 


f9i — fi9 = a. 
ffi + 99i = *>• 

We multiply the first equation by/, the second by g, and add. We multiply 
the first equation by g, the second by /, and subtract. We obtain the two 
equations: 

fifi = a/ + bg, 
f\ = bf — ag. 

If we assume that/i, gi satisfy the hypotheses of the proposition, then 
evaluating these functions at 0, we find the values for a, 6 to be a = 0 
and h = 1. This proves our uniqueness statement. 

In view of Proposition 1. we define the functions/and g in that proposi¬ 
tion to be the sine and cosine functions respectively, and denote them by 
sin and cos. 


Proposition 2. For all numbers x, y we have: 


(1) 



sin^a 

■ -1- COS^ X = 1, 

(2) 



sin (■ 

—x) = —sin X, 

(3) 



cos ( 

—x) = cos X, 

(4) 

sin (a; 

+ 

y) = 

sin X cos y 4- cos x sin y, 

(5) 

cos (r 

4- 

y) = 

cos X cos y — sin X sin y. 



(app. 3, §1) 


THE FUNCTIONS SIN AND COS 


211 


Proof. The first formula has already been proved. To prove each pair 
of succeeding formulas, we make a suitable choice of functions/i, and 
apply equations (*) of the proof of uniqueness. For instance, to prove 
(2) and (3), we let 

/i(x) = cos (—i) and gi(x) = sin (—.r). 

Then we find numbers a, 6 as in the uniqueness proof such that equations (*) 
are satisfied. Taking the values of these functions at 0, we now find that 
6 = 0 and a = — 1. This proves what we want. To prove (4) and (o), 
we let 1 / be a fixed number, and let 

fi{x) = sin (j + y) and ji(i) = cos (x + j/). 

We determine the constants a, 6 as before, in equations (*), and find 
a = —sin y,b = cos y. Formulas (4) and (5) then drop out. 

Since the functions sin and cos are differentiable, and since their deriva¬ 
tives are expressed in terms of each other, it follows that they arc in¬ 
finitely differentiable. In particular, they are continuous. 

Since sin^ x + cos^ x = 1 for all x, it follows that the values of sin and 
cos lie between — 1 and 1. Of course, we do not yet know that sin and cos 
take on all such values. This will be proved later. 

Since the derivative of sin x at 0 is equal to I, and since this derivative 
is continuous, it follows that the derivative of sin x (which is cos x) is 
> 0 for all numbers x in some open interval containing 0. Hence sin is 
strictly increasing in such an interval, and is strictly positive for all x > 0 
in such an interval. 

We shall now prove that there is a number x > 0 such that sin x = 1. In 
view of the relation between sin and cos, this amounts to proving that 
there exists a number x > 0 such that cos x = 0. 

Suppose that no such number exists. Since cos is continuous, we con¬ 
clude that cos X cannot be negative for any value of x > 0 (by the inter¬ 
mediate value theorem). Hence sin is strictly increasing for all X > 0, and 
cos is strictly decreasing for all x > 0. Let o > 0. Then 

0 < cos 2a = cos^ a — sin^ a < cos^ a. 

By induction, one sees that cos (2'*a) < (cos a)*" for all integers n > 0. 
Hence cos (2’'a) approaches 0 as n becomes large, because 0 < cos a < 1 . 
Since cos is strictly decreasing for x > 0, it follows that cos x approaches 0 
as X becomes large, and hence sin x approaches 1. In particular, there 
exists a number 6 > 0 such that 


cos 6 < \ and sin 6 > J. 



212 


SINE, COSINE, AND ANGLE 


(app. 3, §1] 


Then cos2& = cos^ b - sin^ b < < 0, contradiction, prov¬ 

ing that there is a number x > 0 such that sin x = 1, cos x = 0. 

The set of numbers x > 0 such that cosx = 0 (or equivalently, 
sin X = 1) is non-empty, bounded from below. Let c be its greatest lower 
bound. By continuity, we must have cos c = 0. VVe define tt to be the 
number 2c. Thus c = t/ 2. (We follow the Greeks. Unfortunately, it 
would be more practical to define tt as being equal to 4c. This would get 
rid of an extraneous factor of 2 appearing in almost all formulas in mathe¬ 
matics involving tt. However, it is too late in history to change the nota¬ 
tion.) It is clear that c > 0, and by definition of the greatest lower bound, 
there is no number x such that 

0 ^ X < 7r/2 


and such that cos x = 0, or sin x = 1. 
Proposition 3. For all x, we have 

cos X = sin ^x -b ^ and 


sin X = cos 



Proof. Use (4) and (5) in Proposition 2. 

Proposition 4. The functions sin and cos behave as described in the 
following table. We use “s.i.” arid “s.d.” to abbreviate “strictly increasing" 
and “strictly decreasing ” respectively. 



IIA 

H 

IIA 

- S X S T 

2 

TT 5 X S - 

2 

^ I 2t 

2 “ 

sin 

1 

s.i. from 

0 to 1 

s.d. from 

1 to 0 

s.d. from 

0 to —1 1 

1 

s.i. from 
— 1 to 0 

cos 

s.d. from 

1 to 0 

s.d. from 

0 to —1 

1 

s.i. from 
— 1 toO 

1 

s.i. from 

0 to 1. 


Proof. The behavior in the first column has already been proved in the 
course of our discussion concerning the definition of 7r/2. Consider the 
second column. The behavior of sin in the indicated interval comes from 
Proposition 3 (it is the same as the behavior of cos in the preceding column). 
In that interval, the derivative of cos is therefore negative, and cos is 
strictly decreasing. Furthermore, cos decreases from 0 to —1, since we 
must always have sin^ x -1- cos^ x = 1. We can now argue in a similar 
way concerning the behavior of sin in the interval of the third column, 
then the behavior of cos in the third column, and finally the fourth column. 






















[app. 3, §11 


THE FUNCTION'S SIN AND COS 


213 


The arguments are entirely similar to those used in going from the first 
to the second column, and can be left as exercises to the reader. 

A function / is called ipcriodic, and a number s is called a period, if 
f{x + s) = f{i) for all numbers x. 

Proposition 5. The functions sin arid cos are periodic. The numbers 
2mr (n equal io an integer) are periods, and crery period is equal to 2mr 
for some integer n. 

Proof. From the fourth column of Proposition 4 for by an easy direct 
argument, using (4) and (5) of Proposition 2), we know that sin 27r - 0 
and cos 2 IT = 1. Hence 

sin (x + 27r) = sin x cos 27r + cos x sin 27r = sin x, 
cos (x -f 27r) = cos x cos 27r — sin x sin 27r = cos x. 

By induction, it follows that 2mr is a period for any integer n. 

Let «!, S 2 be periods for sin. Then one sees at once tliat Si + s^ and 
are periods. Let she a period for sin. Consider the set of integers w 
such that 2m7r ^ s. Taking m sufficiently large negative shows that this 
set is not empty, and is bounded from above by s/27r. I>et n be its least 
upper bound. Then n is an integer, and 2n7r % s but 2(n + l)7r > s. 
Let f = s — 2n7r. Then t is a period, and 0 ^ f < 27r. We must have 

sin (0 + 0 = sin 0 = 0, ^ 

cos (0 + f) = cos 0=1. 

From the table in Proposition 4, we see that this is possible only if f = 0, 
as was to be shown. 

The preceding propositions give us all the usual properties of sin and cos. 
We recall that a point (a, 6) in 2-8pace is said to lie on the unit circle 
if = 1. We sec that for any number x, the point (cosx, sin x) 

lies on the unit circle. 

Proposition 6. Given a point (a, 6) on the unit circle in 2-space, there 
exists a unique number t such that 0 5 f < 27r and such that a = cos I, 
b = sin t. 

Proof. We consider four different cases, according as a, 6 are ^ 0 or ^ 0. 
In any case, both a and b are between —1 and 1. 

Consider for instance the case where —1 ^ a ^ 0 and 0 ^ 6 ^ 1. 
From the table in Proposition 4, we see that there is only one possible 
column in which we could find a value of t satisfying our reiiuireinents, and 
that is the second column. 



214 


SINE, COSINE, AND ANGLE 


(app. 3. §2) 


By the intermediate value theorem, we know that there is one number t 
such that 


and smt = 6. Since cos^x = 1 — sin^x = 1 — 6^ = a^, and since 
both cos X and a are ^ 0 in this interval, it follows that we must also have 
cos t — a. This proves what we want. 

The other cases are treated in an entirely similar way, and can be left 
to the reader. 

Wc have now obtained a mapping from the real numbers into the plane R^, 
such that the image of the mapping is equal to the unit circle, and such that 
each point on the unit circle is the image of exactly one number in the interval 
0 ^ X < 27r. 

Using the periodicity of sin and cos, given a number c, we can conclude 
that every point on the unit circle is the image of exactly one number in 
the interval 

c S X < c + 27r. 


Hence we conclude: If xi, X 2 are two numbers such that 

(cos xi, sin xi) = (cos X 2 , sin X 2 ), 
then there exists an integer n such that X 2 = xi + 2n7r. 

§2. Angles 

l or most of this section, we discuss the geometry of 2-space. This dis- 
cu.s.sion is logically independent of calculus, and concerns only linear 
algebra. .\t the end we relate the geometry with our sin and cos functions. 

I,et r be a 2-dimensional vector space (over the real numbers), with 
a (positive definite) scalar product. By the unit circle in V we shall mean 
the set of all elements r of V such that !|eP = 1. Thus the unit circle is 
just tlie set of unit vectors in V. 

Let .1 be a non- 2 ero element of V’. The set of all elements tA, where t 
is a number ^ 0, will be called a half-line, determined by A. If £ is the 
unit vector in the direction of A, i.e. 



then one sees at once that E determines the same half-line as A, and is 
the unique unit vector in V which does so. Thus to determine a half-line 
it is necessary and sufficient to specify the unit vector having the same 
direction. 



[app. 3, §2) 


ANGLES 


215 


We define an angle to be an ordered pair of half-lines {L\L 2 ). If P is 
the unique point on the unit circle lying on the half-line L\, and Q is the 
unique point on the unit circle lying on the half-line L 2 , then we detjotc 
the angle (L 1 Z/ 2 ) also by the symbols ZP(?. 

Let (B and (B' be two bases of V. If F:V —* 7 is a linear map, and if 
we let M be its associated matrix relative to (B, (B (or as we al?*) say, 
relative to (B), and let M' be the associated matrix of F relative to (B', 
then we know that there exists a matrix ;V such that .V/' = .V.V/.V“'. 
Using the rule concerning the product of determinants, we conclude that 
the determinant of M is equal to the determitiant of M'. Hence the de¬ 
terminant does not depend on the choice of bases. We call it the determi¬ 
nant of F. 

We recall that an orthogonal map is a linear map which presei\-es 
lengths (or scalar products). 

Proposition 7. The determinant of an orthogonal map F is equal to 
1 or —1. 

Proof. Let {fj, 1 ^ 2 } be an orthonormal basis of V. Let a, h, c, d be num¬ 
bers such that 

F(yi) = avi -j- bv2, 
ffug) = CVi + dv2- 

Since F is orthogonal, the lengths of Fivi) and F{v 2 ) are equal to 1, and 
these two elements are perpendicular. This means that 

4- 5^ = 1, + d^ = I, ac bd = 0. 

H6nc6 

1 = (a^ -t- 6^)(c^ + d^) = aV -f- -t- b^d^, 

0 = (oc -f- bd)^ = a^c^ -f 2abcd -f b^d^. 

From these we obtain 

{ad — 5c)^ = 1, 

thereby proving that the determinant squared is ctjual to 1. Hence the 
determinant itself is 1 or —1. 

We define a rotation to be an orthogonal map whose determinant is 
equal to 1. 

Proposition 8. Let (B be an orthonormal basis of V, and let F be a linear 
map of V into itself. Then F is a rotation if and only if there exist numbers 
a, b such that a^ b^ = 1, and such that the matrix of F relative to (B lii 




216 


SINE, COSINE, AND ANGLE 


(app. 3, §2] 


If F is a rotaiion, a, h are as above, and ffi' is another orthonormal basis of 
V such that the linear map sending (S> on is a rotation, then 



Proof. Assume first that F is a rotation, and let us keep the notation of 
Proposition 7. We have ad — be = 1. Hence 



Multiplying 

yields 


the first of these equations by a, the second by b and adding 

0 = a — a'^d — b^d. 


Since + 6^ = 1, we get a = d. From this it follows at once that 
c = —6, and our first assertion is proved. Conversely, it is trivially verified 
that a linear map represented by a matrix of the given type is a rotation. 

Let now CB' = (u’l, be anotlier orthonormal basis of V, and assume 
tliat it differs from (t’l, f 2 } by a rotation. By what we have just proved, 
there exist numbers x, y such that -j- = 1, and 


I’l = XWy + yw2, 
t -2 = —yu'x + XW 2 - 

Thus the matrix 

» - (; ■;) 

is equal to by definition. Since .V“* 

by a direct computation), it follows that 

■«<'' - i) (: ■:) (; ■;) ■ 

and a direct computation shows that this is the same matrix as 

(: ■:) ■ 

thereby proving our proposition. 

Proposition 9. Let F, G be two rotations. Then F ■> G is a rotation. 
There exists an inverse for F, and is a rotaiion. 

Proof. The first assertion follows directly from the product rule for 
determinants. The second follows from the equations 


= ( ^ (as one sees 

\~y -r/ 


1 = jD(/) = Z)(FF-‘) = D{F)D(F-^), 



(app. 3, §21 


ANGLES 


217 


together with the assumption that D{F) = 1, provided we know that the 
inverse exists. The fact that an orthogonal linear map has an inverse, and 
that this inverse is orthogonal will be left as an exercise. 

Let El be a unit vector in V. The subspace of V which is perpendicular 
to El has dimension 1 (because V has dimension 2). If £2 is a unit vector 
generating this subspace, then any other vector perpendicular to E] can 
be written tE 2 for some number t. Hence there exist exactly two unit 
vectors in V perpendicular to £%, and these are E 2 , —E 2 . 

Proposition 10. Let P, A be unit vectors in V. Then there exists a 

unique rotation F such that F{P) = A. 

Proof. Let Fi, F 2 be rotations mapping P on A. Then 

FT'iFiiP)) = P. 

Hence Fr *^2 is a rotation which leaves P fixed. If we can prove that 
such a rotation is the identity map, then we conclude that Pj 'P 2 = L 
and F 2 — Fi, as desired. 1/et (7 be a rotation leaving P fixed. I^t E be 
a unit vector perpendicular to P. Then {P, E) is a basis for V. Since G 
is orthogonal, it follows that GiE) is perpendicular to P, lienee is equal 
to E or —E. If G{E) were equal to -E, tlien the determinant of G would 
be equal to —1, which is impossible. Hence G{E) = E. Hence G leaves 
both P, E fixed, and since G is linear, it must be the identity map. We 
have therefore proved our uniqueness statement. 

As for existence, let E be as above, and let a, b be numbers such that 

A = aP + bE. 

There exists a unique linear map F such that F(P) = A and F{E) = 
-6P + aE. Since A is a unit vector, we have a* + 6^ = 1, and hence 
the determinant of F is 1. Furthermore. F(P) and P(P) arc perpendicular 
(their scalar product is obviously 0). Hence P is a rotation, and has the 
desired effect. 

Our next task is to define the sine and cosine of an angle. For this we 
must consider an additional structure on the vector space, that of orienta¬ 
tion. 

Two orthonormal bases (B and (B' of V will be said to have the same 
orientation if the (unique) orthogonal map P sending (B into (B' is a rotation. 
If this orthogonal map is not a rotation, then we say that (B and (B' have 
opposite orientation. 

Remark. If (B and (B' have the same orientation, and if (B', (B" have the 
same orientation, then (B and ffi" have the same orientation. Furthermore, 
(B has the same orientation as itself. If (B and have the same orientation, 
then (B' and (B have the same orientation. These statements arc easily 
proved, and the arguments will be left to the reader. 



218 


SINE, COSINE, AND ANGLE 


(app. 3, §2] 


The set of all orthonormal bases of V having a given orientation will be 
said to determine an orientation of V. There exist exactly two orientations 
of V. (Trivial proof, left as an exercise.) 

Let us now assume given an orientation on V. Let Z.PQ be an angle. Of 
the two unit vectors which are perpendicular to P, exactly one of them, 
say E, will be such that {P, E} has the given orientation (because {P, E] 
and {P, —E) have opposite orientations). 

There exist numbers a, b such that 


Q = aP bE. 

Since Q has length 1, we see that Q - Q = 1 = -r 6^. Thus relative to 
the basis (P, E], we see that the point having coordinates (a, 6) lies on 
tlie unit circle. We define the cosine of the angle Z^PQ to be the number a, 
and the sine of the angle Z.PQ to be the number b. We abbreviate these 
\iy cos and sin. 

Let Z.PQ and Z.AB be two angles, and let F be the rotation such that 
FiP) = .1. If FiQ) = P, then we shall say that ZPQ is congruent to 
ZAP. It is easily proved that in that ca.se, ZAB is congruent to ZPQ. 
Trivially, ZPQ is congruent to itself. It is also easily proved that if 
ZPQ is congruent to ZAB and ZAB is congruent to ZCD, then ZPQ 
is (vingmont to ZCD. We shall leave these easy proof.s as exercises. 

PuoposiTiON 11. Two angles ZPQ and ZAB are congruent if and 
only if 

cos ZPQ cos ZAB, 
sin ZPQ = sin ZAB. 


Proof. Assume first that the two angle.s are congruent, and let F be the 
r<itafi"n s\j<‘h that F(P) ,•!. F{Q) = B. I>et E be the unit vector such 
tliat I/', P] is the orthononnal basis having the given orientation. Hy 
(h foiitinii. ;/•’(/'), F(E)\ has the same <irientation. Let a, b he numbers 
such tliat 

Q nP ‘ hE. 

Since /•’ is linear, we get 

l-\Q) - nI-\P) 4 bF{E). 

Since /' (P) --- A, it follows by definiticni that the cosines of our two angles 
are e(|ual, and so are their sines. 

'riie converse will be left as an exercise. 

Let ZI*Q be an angle. We define minus ZPQ to be the angle ZQP, 
write it -ZPQ, and also call it the rugatur of ZPQ. We leave it as an 
exercise to prove that if two angles are congruent, then their negatives are 
congruent. 


(app. 3, §2) 


ANGLES 


219 


Let Z.PQ and ZQ/? be two angles. We define their sum to be the angle 
jLPR. 

Let [P, E) be an orthonormal basis having the given orientation. We 
call the angle /.PE a positive right angle. We call /PQ a flat angle if 

Q= -P- 

It is then possible to prove entirely within the context of linear algebra, 
directly from our definitions, all the properties of sines and cosines of 
angles which have been proved in §1 for the sin and cos functions (i.e. 
the properties of Propositions 2 and 3). All the relevant definitions have 
now been made. In fact, we note that the addition formula for the cosine 
function was proved in our First Course by a method which applies v(*rl)a- 
tim, since all the concepts involved in it have now received an analytic 
definition. 

It is a good exercise for any one interested to carry out tliese proofs. 

It is also possible to carry out the proofs by first relating directly our 
sines and cosines of angles with the sin and cos funclicms. This is done as 
follows. 

Proposition 12. Assume that an orientation of V hasbeen fixed. Uiven 
a number 0, let F$ he the rotation whose associated malri.r with respiel to 
any orthonormal basis having the given orientation is 

( cos 6 —sin 

sill 0 cos 0 / 

If 6, are numbers, then 1'$^^ “ PeP,^ = P^Ps- Also, /• = Ig *. 

We have Fe = F,^ if and only if 6 and s? differ by a period 2nir. 

Proof. The fact that the matrix of a rotation is the same for two ortho¬ 
normal bases having the same orientation was proved in Proposition 8. .\ 
direct multiplication of matrices will show that our assertions are tru(‘, 
using the properties of sin, cos proved in Proposition 2, §1. 

Given an angle /PQ, we observe that we can find its sine and cosine as 
follows. We let/i'be the rotation such that/'’(P) = Q. Hy Proposition 12, 
there exists a number 0 such that F = Fg. Then 


cos /PQ = cos 6 and sin /PQ — sin d. 

To each angle, we have a.s.sociated a rotatifin, and hence a set of numbers 
of type 6 -f 2n7r. Conversely, given a rotation F and a point P on the 
unit circle, we can a.ssociate to these tlie angle /PQ where Q ----- /•’(/'). 

lA?t /PQ be an angle and 0 a number. We define the expre.ssion "/PQ 
has 6 radians" to mean that Fg is the rotation a.ssociated with the angle 
/PQ. If v? = 0 + '2.mr, and if /PQ ha-s d radians, then /I’Q also has 
^ radians. 



220 


SINE, COSINE, AND ANGLE 


(app. 3, §21 


Using Proposition 12, it now follows trivially that the cosine of the sum 
of two angles satisfies the usual addition formula, if we use the analogous 
formula for the cos function, proved in Proposition 2, §1. We give the 
proof as an example. 

Let Z-PQ have d radians, and ZQR have v? radians. Then Fo{P) — Q 
and F^{Q) = R. Hence 

F,+^(P) = F^(F,{P)) = R. 

Hence APR has 0 + radians. Applying the formula 

cos (fl + V?) := cos 6 cos — sin 0 sin 

and the definitions, we get the addition formula for the cosine of the sum 
of two angles. 

The addition formula for the sine is proved in the same way. 



Answers to Exercises 




ANSWERS TO EXERCISES 


I am much indepted to Mr. I. Schochetman and Mr. J. Hennefeld for the 
answers to the exercises. 


Chapter I, §1 



A ^ B 

.1 - B 

1 . 

(1.0) 

(3, -2) 

2. 

(-1.7) 

(-1.-1) 

3. 

(1.0, 6) 

(3, -2, 4) 

4. 

(-2,1,-!) 

(0, -5, 7) 

5. 

(37r, 0, 6) 

( —TT, 6, —8) 

6. 

(15+7r, 1,3) 

(15 — TT, —5, 5) 


3.1 

-2B 

(6. -3) 

(2. -2) 

(-3, 9) 

(0, -8) 

(6, -3, 15) 

(2, -2. -2) 

(-3,-6, 9) 

(2, -6. 8) 

(3ir. 9, -3) 

{-Air, 6,-14) 

(45, -6, 12) 

(-2t,-6,2) . 


Chapter I, §3 

1. 5, 10, 30, 14, 10 + TT^. 245 2. -3, 12, 2, -17. 2^^ - I6, 15ir - 10 

4. (b) and (d) 6. §, f, 0 


Chapter I, §4 

1. VlO, v^, Vl 4, n/10 + ^, >/245 

2. 4, v^, n/26, •s/4x^+ 58, + 10 

3. (0.3), §(-1.1,1), §|(l.-3.4). 


JT - 8 


(2;r, -3, 7), 


15ff - 10 


2^2 + 29 ' ■’ '■ + 10 

4. §(-2, 1). |(-1,3), §(2. -1.5), 2, -3), 


iir, 3, -1) 


2t - 16 


5. 0. 0 


+ 10 
6. Vir, Vir 


(ir, 3, —1), 


3ir ~ 2 
49 

7. \/2V 


(15. -2, 4) 

8. V2 


Chapter I, §5 

1. X = (I. 1,-1)+ 1(3, 0,-4) 2. X = (~1.5.2) + l(-4,9, 1) 

3 y = j _|_ 8 4. 4y = 5z — 7 6. (c) and (d) 

7. (a) r — 1 / + 3z = -1 (b) 3i + 2y - 42 = 27r + 26 

(c) X — 52 = —33 

8. (a) 2x + y + 2z = 7 (b) 7i - 8y - 92 = -29 

(c) y + 2 = 1 

0 (3 _9 —ft) ( 1 . 5 .— 7 ) (others would be constant multiples of these) 

lo! (a) + 5)'^'' 11- (151" + 261 + 21)*^=^. >7146/15 

12. (-2, 1,5) 13. (11. 13, -7) 

14. (a) X * (1, 0,-1) + 1(-2,_I, 5) (b) X = (1^0. 0) + 1(11, 13,-7) 

15. (a) -i (b) -2/V42 (c) 4/v/6B (d) -n/2/3 


223 


224 


ANSWERS TO EXERCISES 


16. t = 


p. N -Q-N 
N’N 


19. (-4,^,^) 

21. (a) J(-3,8.I) 

22. ^ 


17. (1,3, -2) 

V2240 

(b) (§, 0) (^. 1) 


18. 2/y/3 


Chapter I, §6 

1. (-4, -3, 1) 

4. 0 

Chapter 77, §1 

1 . (c*, —sin i, cos 0 

3. (—sin t, cos 0 
7. B 

9. er + y -f- 2z = c2 _|_ 3 

11. [(X(0 -Q).{X{t) 
Chapter II, §2 


2 . (- 1 , 1 ,-!) 

5. £ 3 , El, E 2 in that order. 


3. (-9, 6,-1) 


2 . (2 cos 2, ^ , .) 

4. (—3 sin 31, 3 cos 31) 

5. y — y/3x and y »= 0 

10. X + y = 1 

12. n/2 13. 2VTZ 


2. (a) ( 0 , 1, + I(-4 0, 1) (b) (1, 2, I) + ((1, 2. 2) 

(c) (e^, e-3, 3\/2) + —3e-^, 3\/2) (d) (1, 1, I) + ((1, 3, 4) 

TrVil ^ ^ . . 5 /- 6 + 


(b)#(ViT-i) + 5(,<,g6±^ 


3. (a) . 

8 . ... c . 4 y^--0 5 

(c) 6 

4- ^r/2 5. (2, 0, 4) and (18, 4, 12) 

6. (a) X + 42 = 7r/2 _ (b) y = 2x 

(c) —X -f e®y + v/2 e^z == 6e^ (d) 2x — 2y + z = 1 


Chapter III, §2 



ds/dx 

df/dy 

df/dz 

1. 

y 

X 

1 

2. 

2xy5 

5x^y* 

0 

3. 

y cos (xy) 

X cos (xy) 

—sin (z) 

4. 

—ysin (xy) 

—X sin (xy) 

0 

5. 

yz cos (xyz) 

xz cos (xyz) 

xy cos (xyz) 

6. 

yze**** 

xze'*'* 

xye**" 

7. 

2x sin (yz) 

x^z cos (yz) 

' x^y cos (yz) 

8« 

yz 

xz 

xz 

9. 

2+ y 

Z + X 

X + y 

10. 

cos (y — 3z) 

1 ^ 

>/l - x2y2 

—X sin (y — 32) 

4 ^ 

Vl - x2y2 

3x sin (y — 3z) 



ANSWERS TO EXERCISES 


225 


11 . ( 1 ) ( 2 , 1 , 1 ) 

(8) (6,3,2) 

12. (4) (0, 0,0) 

13. (-1,-2, 1) 


(2) (64, 80,0) (6) e«(6. 3, 2) 

(9) (5.4. 3) 


(5) IT cos (r^)(ir, I, 1) 

14. yx" 


(7) (2 sin TT^, r cos r cos r^) 
x‘'Iogi 


Chapter III, §3 

6. lim A) = —1, \\mgih,k) = 1 

A -»0 i ->0 

lim lim i) = —1, lim lim p(A, A) * 1 

i;_»0 h~tO A—0 k-,0 


Chapter JV, §1 


1 - 
dr 

2. (a) 


_ df du df ^ 

dx dr dy dr dt dx dt dy dt 

t “ I = - '^y^ 

^ = 3x^ + (3 + 2s)yz — 3iz + 6sxy — 2sy^ 
ds 

^ = 6x* + %yz - 3x2 + Uxy - 2ty 
oi 


(b)^ = 


V + 1 


a/ 


x"+ 1 


3 I 

-k 

9. (a) 

(d) 


dx (1 — xi/)2 dy (1 — xy)2 

dj_ ^ {x + l)sin (3f - s) 
ds (1 — xy)2 

df _ 2(y^ -f- 1) cos 21 - 3(x^ + 1) sin (3l - ^) 
dl ~ (1 - xy)^ 

X df y 


(^ 2 + y 2 -f- 22 ) 1/2 dy (x 2 + y 2 4 - 22 ) 1/2 


X. 


-X/r^ 

2e-^X 


(b) 2X 
(e) -X/r^ 


(c) -3X/r« 

(0 -4mX/r"+2 


Chapter IV, §2 



Plane 

Line 

(a) 

(b) 

(c) 

(d) 

(e) 
(0 

1 

6i + 2y + 32 = 49 
r 4- V + 22 = 2 

13x + 15y + --15 

6i - 2y + 162 = 22 

4x + y 4- 2 = 13 

2 = 0 

X = (6. 2, 3) 4-((12, 4. 6) 

X = (1,1,0)+ ((1,1, 2) 

X - (2,-3, 4)+ ((13, 15, 1) 

X - (1,7,2) +((-0,2,-15) 

X = (2. 1,4)+ ((8,2, 2) 

X = (l,ir/2,0) + ((0.0.T/2+ 1) 




226 


ANSWERS TO EXERCISES 


2. (a) (3, 0, 1) (b) X = (log 3, 3 t/2, -3) + t(3, 0, 1) 

(c) 3x + 2 = 3 log 3 — 3 

3. (a) A' = (3. 2, -6) + 1(2, -3, 0) (b) A' = (2,1, -2) -h I(-5.4, -3) 

(c) X = (3, 2, 2) + t(2, 3, 0) 

4. Distance = ■'/(A{0 — Q]^ 


Chapter IV, §3 


1- (a) ^ 

(b) max = VlO, 

min = —VlO 

2. (a) 3/2 n/5 

a 

(b) M (=) V580 

3, Increasing i — 

9\/3 3V3\ 

2 ' 2 /’ 

decreasing (9\/3/2, 3\/3/2) 

/ 3 

3 —3\ 


« ( 4 . 63 /^' 

2-67/4’ 67/4/ 

(b) (1.2, -1, 1) 

Chapter IV, §4 



1 . log||X|| 


2. -l/2r2 

Chapter U, §I 



1. No 

2. No , 

3. No 4. No 



n +2 

5. (a) r 

(b) logr 

(0 If 1 . 

6. 2x^y 

7. X sin xy 

f A 

8. x3j/2 

9. -h y* 

10. 

11. ff(r) 


12. Given the vector field F = {fi ,... ,f„) in n-space, defined on a rectangle, 
centered at the origin, assume that 


for all indices i,j. h'or n = 3, define v’(x, y, z) to be 

f Mt, y, z) dt + r fiiO, t, z) dt-h r faiO, 0, t) dl, 

JO Jo Jo 

and similarly for n variables. Using the hypothesis and the fact that a partial 
derivative of parameters can be taken in and out of an integral, you will find 
easily that ^ is a potential function for F. 

Conversely, given a vector field F = (/,, on an open set U, if there 

exists a potential function, and if the partial derivatives of the functions fi exist 
and are continuous, then the relations 


dxj dxi 

must be satisfied for all x,j, for the same reason as that given in the text for two 
variables. This generalizes Theorem 2. 


Chapter V, §2 

1. -369/10 2. 23/6 3. 0 4. 0 5. 54 

6. V3C/2 7. 4/3 8. —T — § 9. 4/15 10. 4t 

II- 3t/4 12. —1/2 13. —56 



ANSWERS TO EXERCISES 


227 


14. Assume that F = grad <p on U. Let C be a continuously differentiable 
curve between two points P and Q in U. By the chain rule, we have 

f(C(0) ■ ~ = (grad ^)(C(0) ■ f ■ 

Hence the integral of F along C is the integral of the derivative of y>{C{0) taken 
between the corresponding limits, and hence is equal to ^(Q) — y(f*). If the 
curve is closed, i.e. Q = P, this is equal to 0. In the piecewise continuously dif¬ 
ferentiable case, there is a sequence of continuously differentiable curves between 
points {P, Pi}, {Pi, P 2 } . {Pi, P} and hence the integral is equal to 

^{Pi) — <fi{P) + •piP2) — <f{Pi) -f-h — •p{Pi) = 0. 

15. First an observation. Let C be a continuously differentiable curve defined 
on the interval (a, 6), a < b. Let C(a) = P and C(b) = Q. We can define a 
curve C~ by letting C~{t) - C(a -j- 6 — /), for alt I in the interval (a, 6). Then 
C~(a) = Q and C~(b) = P. Thus C~ goes from Q to We call C~ the nega¬ 
tive of C. If we integrate along C~, we also .«ay that we integrate along C in re¬ 
verse direction. A trivial application of the chain rule for functions of one 
variable shows that for any continuous vector field F, we have 

-/o ^ = I ^ 

The same observation now is valid for a jjiecewise continuously differentiabk- 
curve. 

Assume that the integral of F along any closed curve in U is 0. Let Ci, C 2 
be two curves in V from a point P to a [)oint Q in U. Then Ci followed by 
is a closed curve from P to P. Hence 



By the previous observation, we see that the integral of F along Ci is equal to 
the integral of F along C 2 . I’nder our a.ssiitnption, we can write 



to mean the integral of f from P to Q along any (piecewise continu<Jusly differ¬ 
entiable) curve from P to Q. 

I>et now Po be a fixed point of V , and define a function v? on i ’ by the rule 

v=(P) = r F. 

We must show that 0,^(P) exists for all P in f.’ and if F = (/i,.. . ,f„) is the 
representation of /•’ by coordinate fun<-tions, then I),f{P) = f,(P). To do this, 
let K, be one of the .standard unit vectors. The Newton quotient of ^ at P for 
the i-lh variable is then 

V9(P + hEj) - v>(P) 
h 



228 


ANSWERS TO EXERCISES 


The integral of F from PoioP hEi can be taken along a curve from Po to P, 
and then from P to P hEi. After cancellation of the integral from Po to P, 
we obtain 

V(P + hEi) - v(P) Ip ^ 


taking the integral over the line segment C(t) = P + tEi, with 0 ^ t ^ h. 
Note that dC/dl = Ei. Hence F(C(0) • C'(t) = /i(C(0), and our Newton 
quotient is therefore equal to 

tfiim) di 

JO 

h 


If is a continuous function of one variable, then by the so-called fundamental 
theorem of calculus, we know that the derivative of the integral of is equal to 

9- ‘-e- H 

lim I / ^(0 di = g(0). 

h-,0 n Jo 


We apply this to the function ^(0 = fi{Cit)) = /i{P + tEi). 
obtain the limit 


lim 

h-,0 


^(P -f hEi) - <fi(P) 
h 


/.(P). 


We therefore 


This proves what we wanted. 


Chapter VI, §1 



1 d^/dx^ 


d^f/dx dy 

1. 

yV" 

1 

1 

yxe^'f 4- e**' 

2. 

— ?/“ sin xy 

—z® sin xy 

—xy sin xy 4* cos xy 

3. 

2y^ ' 

6x2y 

6xy2 4- 3 

4. 

0 ! 

1 

2 

1 

1 2 

5. 

~r 4xV*-"*'' ! 

4- 4y2) 

4xye**'*-«'* 

6. 

2 cos (x" 4- y) 

—4i' sin (x“ 4- y) 

—sin (z^ + y) 

—2x sin (x^ 4" y) 

7. 

— (3x^ 4- y)^ cos (x^ + ly) 
—6x sin (x^ 4- xy) 

—z” cos (z® 4- xy) 

— (3X2 _j_ j-y-) 

—sin (x^ 4- xy) 


g aV ^ 2(1 + (j:- - 2iy)^) - (2j - 2y)\x^ ~ 2xy) 

■ (1 + (J-^ - 2XV)2)2 

^ ^ -(1 + (/ - 2xy)~) - 2y(x^ - 2xy) 

di/2 (1 + (j.2 _ 2xy)2)2 

^ -2(1 + (X- - 2xy)^) - (2j - 2y)(x^ - 2xy)i-2y) 

dxdy (1 -j- (x2 — 2xy)2)2 



ANSWERS TO EXERCISES 


229 


9. all three = e*'*’*' 

11. 1 12. 2x 

14. (1 — cosxyr 

15. sin (z + y + 2 ) 


17. - 


48zy2 


(z2 + + z2)4 


10. all three = —sin (z + y) 
13. + 3zy2 + x^y^z^) 

3xyz sin xyz 

16. —cos (x + y + z) 

18. 6z^y 


Chapter VI, §2 

1 . 9 /)i+ 12D1D2 + 4D2 

2 . Z)| + Z)I+ + 2 D,D 2 -h 2D2D3 + 2D1D3 


3. Di - D 2 

5. D? + 3D;Z)2 + 3/)iD2+ />2 

6 . D* + 4D1D2 + CDi /)2 + 4D1D2 + Da 


4. D^+2DiD2+ D 2 


7. 2D? - D 1 D 2 - 3 D 2 

12 . (ff + 


8 . D1D2 — D3D2 oDjDj — 5 Dj 






13. 8 


14. 4 


15. 4 


16. 1 


Chapter VI. §3 
1. zy 


2 . 1 


2 2 

5, 14-z + y + y + jy + |- 


8. y + zy 
II. (a) Yes, 0 


9. z + zy + 2y2 
(b) Yes, 1 


2 2 .3 2 


3. zy 


6 , 1 -^ 


4. + y^ 

7. X 


10. Yes. 0 

12. Yes, 0 13. Yes. 0 

15. 0 


17. Terms up to degree 2 given in text. Term of degree 3 is i(z 1- 2y)^. 


Chapter VI, §4 

3. First observe that for each point X we have 

f{X) -m = /' Dj{ix)di. 

Jo 

where D * ziDi + z„D». Assuming thal/(0) = 0, and repealing the 



230 


ANSWERS TO EXERCISES 


argument, assuming thatV/(0) — 0, we obtain 

f(X) = t D^istX) ds dt. 

Thus we find 


f(X) = ^ hii(.X)zai, 

i.j-l 


where 


hii(X) = t Di DifislX) dsdi, if i 9^ j. 


1 A 


hii(X) ^ I ( hi Di DifislX) ds dt, if i = j. 

Jo Jo 

We have ha = ha because D,D> = DjDi. 

6. (a) X-\’t{Y - X) 

(b) By the mean value theorem applied to the function 

jjW = S(X + ((r - X)), 

we get 

/(K) - S{X) = (grad/(Z)) ■ (Y - X) 
for some Z on the line segment. Now use the Schwarz inequality. 


Chapter VII, §1 

1. (2, 1), neither max nor min 

2. ((2n -h Ott, l) and (2mr, 1), neither max nor rain 

3. (0, 0, 0), min, value 0 

4. (\/2/2, >/2/2), neither local max nor min. (f/tnl; Change variables, let¬ 
ting u - X y and v = x — y. Then the critical point is at (V2, 0), and in 
the (u, (’)-plane, near this point, the function increases in one direction, and de¬ 
creases in the other.) 

5. All points of form (0, t, — t), neither max nor min. 

6. All (x, y, 2 ) with x^ + y^ + = 2nT are local max, value 1. 

All (x, y, z) with -f- y^ -f- z* = (2n + l)ir are local min, value —1. 

7. .\11 points {x, 0) and (0, y) are mins, value 0. 

8. (0, 0), min, value 0. 

9. (f, 0) min, value 0. 

10. (0, nir), neither max nor min. 

11. (1/2, 0), neither max nor min. 

12. (0, 0, 0), max, value 1. 

13. (0,0,0), min, value 1. 


Chapter VII, §2 

1. x2 -I- 4xy — y2 

2. At ((2n + l)ir, l), —ly. 

3. a; + y + z 


At (2nir, 1), -fxy. 



n/2 


(t + + ^) 



ANSWERS TO EXERCISES 


231 


5. xy -f xz 

6. At (o, 6, c) such that = 2nT, the form is 

—2(a^i^ + + c^z^) — 4(abxy + aczz + bcyz). 

At the point (a, b, c) such that + 6' -f = (2n + Dtt, the form is 
2(a^i^ + b^y^ + c^z^) + 4(abzy + aczz + bcyz). 

7. At points (o, 0) we get a-y^. At points (0, 6). we get 

8. 9. 0 10. ±xy II. Z-+ 2y- 

12. -x2 - y2 _ ,2 13. 3,2 + + 2^; 

Chapter VII, §3 

1. Min = -2 at (-1, -I), max = 2 at (j, ]) 

2. Max = v/3 at VZ/Hl, I, 1), min -VS at -Va, 3(1, l, l) 

3. Max i at (V2/2, V2/2) and (-V2/2, -\/2/2) 

4. Max at (J, §), no min 

5. Min 0 at (0, 0), max 2/c at (0, ±1), rel. max at (±1,0) 

6. Max = I at (1,0), min = 1/9 at (3.0) 

7. (a) max (b) neither (c) neither (d) min 

8. f = {2n+ Dtt, so (-1.0, 1) and (-1,0, -1) 


Chapter VII, §4 

h -1/V2 2. 1 + 1/V2 3. at (§, 5 .^) min = 12 

4. ^ = i(A + 5 + C).min valucisj(/l2 + _ lyj _ _ no 

5. 45 at ±(\/3, VS) 6. at (1, 1, 1) 7. Min 0, max 0 

8. Max at (jr/8, —ir/8), value 2cos2(7r/8): min at (5;r/8, 3ir, 8) value 

cos® (5ir/8) + cos® (3ir/8) 

9- (0,0, ±1) 10. No min, max = ^ at (i, i) 11. 1 


14. ixi 





Chapter VIII, §1 


1. (a) 12 (b) 11/5 (c) 1/10 (d)2 + jr2/2 (e) 5/6 (f) 7r/4 

2. (a) -3jr/2 (b) e - 1/e (c) ir® - 40/9 (d) 63/32 

3. 9i - 28/3 


Chapter VIII, §2 

1. (e - l)ir 2. 3ir/2 3. ir(l - e-*) 4. tt 


5. 2fettV3 6. 3kira*/2 7. kir/4 


Chapter VIII, §3 

2. 0 3. ka*r 4. 2)rt(6® - a®) 5. 7r6aV4 

6, kwa*/2 7. jr/8 8. 2irj^- ^ ~ ^ ' 

. 2 -l4-\/5 

whore Ti) - — - 

9. |o®(3t — 4) 10. ira^ 

11. (a) ir/3 (b) 2irv/2/3 (c) n/2 (d) 7r/32 

12. (a) 25 (b) 16/2 (c) la^b^/S 



232 


ANSWERS TO EXERCISES 


Chapter IX, §2 

2. (a) A - B, (U-1) 

(c) A + B, (1, 1) 

3. (a) (i, -i, i) (b) 

7. (3, 5) 


(b) \A + ^B, ( 5 , §) 
(d) 3A + 2B, (3, 2) 
( 1 , 0 , 1 ) (c) 

8 . (-5, 3) 


Chapter X, §1 





2A + B 









Chapter X, §2 



7. Same 


i)' 




ANSWERS TO EXERCISES 


233 




11. Rows of .4: (1, 2, 3), (—1,0,2). Columns of .4: 

1 \ /2\ /S 

-1 

No answer given for the rest. 

Chapter X, §5 
4. dim 4 

6 . (a) -^(1, 1. -I) and -^(1,0, 1) 

y/z y/2 

(b)^(2, 1.1). -^{-1.7.-5) 

>/6 5VZ 

7. -^(1,2, 1,0) and (-1.-2, 5, 1) 

■n/6 \/31 

8. id,-1,1,1), 

y/2 Vl8 

9. VM ((=* - 3(/4), VZt 

10. V80 («2 - Zt/i),- n/3 t, 10^2 - 12( + 3 

11. (a) 1 (b) 1 (c) 0 (d) 2 

12 . n - 1 13. n - 2 


Chapter X/, §1 

1 . (a) cos X 

2 . (a) c* - 1 

3. (a) n 

4. (a) (e, 1) 

5. (a) (c+ 1,3) 

6 . (a) ( 2 , 0 ) 

7. (a) 1 


(b) e* 

(b) arctan z 
(b) 13 
(b) ( 1 . 0 ) 

(b) (e 2 + 2 , 6 ) 
(b) («, t) 

(b) 11 


8 . ellipse 9 x 2 _j_ 4^2 _ 3 ^ 9 . line x ~ 2y 

10 . circle x 2 -f = ^ 2 ^ circle x 2 + y^- = 

11 . cylinder, radius 1 , 2 -axls = axis of cylinder 

12 . circle x 2 + y* = 1 


(c) 1 /x 
(c) sin X 
(c) 6 

(c) (I/e, - 1 ) 

(c) ( 1 , 0 ) 


Chapter X /, §2 

1. All except (c), (g). 

4. If u is one element such that Tu = w, then the set of all such elements is 
the set of elements u + t> where Tv = 0. 

8 . Only Ex. 8 

9. If FiA) = 0, image = point FiP). If F{A) 9 ^ 0. image is the line 
FiP) + tF(A). 

12. Parallelogram whose vertices are B, ZA, 3.4 + B, 0. 

13. Parallelogram whose vertices are 0, 2B, bA, bA + 2B. 



234 


AN'SWEI^ TO EXERCISES 


Chapter XI, §3 

2. X = (1. I, 0, -1) -f ((1, -2, 1, 4) + u(3, -3, 1, 0) 

6 . Constant functions 

7. Ker * polynomials of deg ^ 1, Ker Z)** = polynomials of deg ^ n — 1 

9. Constant multiples of e*. 10. Constant multiples of e*'. 

Chapter XI, §5 

1. (a) 2 (b) 2 (c) 2 (d) I 

3- n 4. (a) 1 (b) 2 (c) 1 (d) 0 

Chapter XII, §I 

*• (5, 3) (b) (5, 0) (c) (5, 1) (d) (0, -3) 


4 



5. Second column of A 6. i-th column of A 

Chapter XII, §2 





ANSWERS TO EXERCISES 


235 


COS 6 —sin 6 


4. -^(-1.3) 
\/2 


5. (-3, - 


\sin d cos 8 / ^ ^ 

6 . i' = z cos 6 4- y sin 6 , y' = —x sin 0 + y cos 6 


8 . (a) 


( 0 1 0 > 
0 0 2 
0 0 a 


9. mn 

Chapter XII, §4 
1. lA = AI ^ A 


3. (a) 


10 . mn 


\0 0 / 

( 0 1 0 0 o' 

0 0 0 0 0 

0 0 10 0 

0 0 0 2 1 

0 0 0 0 2 

11 . n 


0 -1 


2 . 0 

1 o\ 


3 3 37 

1 1 -18 


5. AB = 


5 -1 


BA = 


6 . AC ^ CA = 


7 14 


BC = CB ^ 


\21 -7/ ^ 

If C = zl, where z is a number, then AC = CA = z/1. 


7. (3, 1, 5), first row 


8 . Second row, third row, i-th row 


0 0 I 


11. A 


0 0 0 » * 0 matrix. U B = 


.0 0 0 , 


0 0 1 1 ' 
0 0 0 1 
0 0 0 0 
0 0 0 0 


0 111 
0 0 11 
0 0 0 1 
0 0 0 0 


thi-n 


0 0 0 1 


.0 0 0 0 ^4 

0 0 0 0 

Vo 0 0 0/ 


13. Let >1 = M®' {id) and B = A/«- {id). 


Chapter XII, §6 
1 . (a) 2 


(b) I 


(c) 1 


(d) 0 



236 


ANSWERS TO EXERCISES 


ChapUr XIII, §2 





5. 4z — Ay — 2 = 0 
8. —r 


6. X = -1 7. y 2 = _1 

9. —p^ sin ^ 


Chapter XIV, §2 

1. (a) —20 (b) 5 (c) 4 (d) 5 (e) —76 

2. (a) -18 (b) 45 (c) 0 (d) 0 

^ 7. <2 + 5 


Chapter XIV, 

3. The product ^\\(l22 * * * 


Chapter XIV^ §4 




ANSWERS TO EXERCISES 

ad ~ be (-C o) 

Chapter XIV, §8 


4. (a) 1 


(b) 1 

(C) -1 



(d) I 

(e) 1 

5. (a) 

1 

2 

3 ' 


(b) 

1 

2 

3 


(c) 


.3 

1 

2. 



.2 

3 

1. 


. 

(d) 

1 

2 

3 

4 

(e) 

1 

2 

3 

4 

(0 


.3 

1 

2 

4. 


.2 

1 

4 

3. 

. 


Chapter XK, §1 


• 10 10* 

(b) 2 

(c) -1 + 3t 

(d) -1 + 3i 

(e) Or + (7 + 7r^)i 

(0 -2 w+ri 

(g) -3V2 + V2in 

(h) -8 - 6t 


^^^To-To 



(e) i - i 

(0 i+i 



3. 1, a 


Chapter XV, §2 


1. (a) y/2e'^'* 

(b) Vs 

(c) 3c" 

(d) 4e"'" 

(e) 

(f) 5e-"^ 

(g) 7e" 

(h) Vs e'“"‘ 

2. (a) -I 

(b) - l + 

(c) ~ + ~i 

\/2 y/2 

(d) ^ ^ 

.X 1 . V^- 

(e) j+^. 

(0 -• 

(g) -I 

(h) - 

Vs Vs 


237 


1 

2 3 
2 1 . 

2 3 4 
2 1 3 


238 


ANSWERS TO EXERCISES 


4. If a = re^, then the n-th roots are 



7. All u such that u = z + 2nirf, n integer 

8. All z = 2niri 11. (c) 0 

13. (a) 1 - 3t (b) -6 - 4i 14. Yes 



Index 



INDEX 


Absolute value, 195 
Acceleration, 21 
Angle between vectors, 12 

Basis, 109 
Beginning point, 4 
Boundary point, 82 
Bounded, 83 

Circle, 31 
Closed, 83 
Closed ball, 31 
Column, 113 
Column rank, 141 
Complex number, 194 
Composite, 154 
Conjugate, 195 
Conservative, 47 
Continuous, 207 
Coordinate, 2 

Coordinates (wrt basis), 110 
Cramer’s rule, 181 
Critical point, 79 
Cross product, 17 
Curve, 19 

Cylindrical coordinates, 99 

Derivative, 20, 166 
Determinant, 177 
Differentiable curve, 19 
Differentiable function, 36 
Differential operator, 67 
Dimension, 120, 126 
Direction, 5 

Directional derivative, 46 
Distance, 10 
Dot product, 6 

End point, 4 
Equivalent vectors, 5 


Generate, 107 
Gradient, 33 

Homogeneous, 69 
Homogeneous linear equation 
Hyperplanc, 14 

Identity, 133 
Image, 129, 139 
Integrable, 90 
Interior point, 82 
Inverse, 184 

Jacobian determinant, 174 
Jacobian matrix, 166 

Kernel, 138 
Kinetic energy, 48 

Ungth, 9 

Length of ourve, 23 
Level curve, 29 
Limit, 20.’} 

Linear combination, 107 
Linear map, 132 
Line integral, 56 
Local max, 80 
Ivocai min, 80 
Ix)cated vector, 4 
Lower sum, 89 

Mapping, 129 
Matrix, 113 

Non-singular, 184 
Non-trivial solution, 117 
Norm, 9 
Normal, 14 

Normed vector sjiace, 205 


INDEX 


242 

Open ball, 31 
Open set, 32 
Opposite direction, 5 
Orthogonal, 122 
Orthogonal map, 144 
Orthonormal, 123 
Osculating plane, 27 

Parallel, 5 

Partial derivative, 32 
Permutation, 185 
Perpendicular, 6, 7 
Piecewise differentiable, 58 
Plane, 14 
Point, 2 
Polar form, 197 
Polynomial approximation, 73 
Potential function, 50 
Projection, 11 

Quadratic form, 81 

Rank, 142 

Rectangular coordinates, 99 
Rei)eated integral, 91 
Row, 113 
Row rank, 141 

Scalar product, 6 
Schwarz inequality, 8 


Simple differential operator, 65 
Smooth, 90 
Speed, 22 
Sphere, 31 

Spherical coordinates, 100 
Square matrix, 114 
Subset, 105 
Subspace, 106 
Surface, 43 

Tangent linear map, 165 
Tangent plane, 44 
Tangent vector, 21 
Transpose, 115 
Transposition, 186 
Triangle inequality, 10 
Trivial solution, 117 

Unit matrix, 152 
Unit vector, 10 
Upper sum, 89 

Value, 129 
Vector, 5 
Vector field, 47 
Vector space, 105 
Velocity, 21 

Zero map, 133 
Zero matrix, 114 


PBINTED IN JAPAN 
TOKYO 


ABCDESSSTdS 




